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1  Introduction 


Our  goal  is  to  develop  a  fully  automated  classification  scheme  for  computer-aided  diag¬ 
nosis  (CAD)  in  mammography.  Traditional  CAD  classification  schemes,  and  performance 
measurement  tools  such  as  receiver  operating  characteristic  (ROC)  analysis,  are  based  on 
the  premise  that  the  observations  are  classified  into  two  groups,  most  commonly  malignant 
and  benign.  Such  classification  schemes  are  difficult  to  fully  automate,  as  they  analyze 
radiologist-identified  lesions;  this  is  because  many  false-positive  (FP)  detections  produced 
by  a  computerized  detection  scheme  cannot  reasonably  be  classified  as  benign  or  malignant 
lesions.  Our  proposed  scheme  would  classify  computer  detections  into  three  groups:  malig¬ 
nant  lesions,  benign  lesions,  and  FP  computer  detections.  This  method  presents  considerable 
difficulties  in  terms  of  both  signal  detection  theory  and  performance  evaluation  methods  such 
as  ROC  analysis.  Our  efforts  in  this  direction  have  thus  generally  been  more  theoretical  than 
practical  so  far,  but  our  results  so  far  are  promising. 

2  Body 

A  wide  variety  of  medical  decision-making  tasks,  in  particular  tasks  for  which  CAD  has  been 
proposed  as  an  aid  to  the  physician,  can  be  formulated  as  “two-group  classification”  tasks. 
That  is,  the  physician  must  use  the  information  available  about  a  patient  (e.  g.,  a  set  of 
mammographic  films  of  the  patient,  and  the  result  of  computer  analysis  of  those  images)  to 
decide  whether  a  patient  belongs  to  a  diseased,  or  abnormal,  group  or  not  (e.  g.,  whether  a 
breast  lesion  suspicious  enough  to  warrant  further  imaging  procedures  or  biopsy  is  present 
or  not). 

ROC  analysis  has  long  been  considered  the  most  appropriate  methodology  for  evaluating 
the  performance  of  a  two-group  classifier  or  observer  [1],  particularly  for  medical  decision¬ 
making  tasks  [2],  Furthermore,  the  optimal  or  “ideal”  observer  —  that  observer  which 
achieves  the  best  possible  performance  given  a  particular  population  of  observational  data 
-  has  also  been  well  understood  for  quite  some  time  [3].  In  practice,  the  ideal  observer 
requires  knowledge  of  the  probability  density  functions  (PDFs)  from  which  the  observational 
data  are  drawn,  and  thus  cannot  be  achieved  in  non-trivial  tasks  by  human  or  automated 
observers.  Nevertheless,  successful  methods  for  estimating  ideal  observer  decision  variables 
from  a  sample  of  observational  data  [4],  and  for  plotting  an  ideal  observer  ROC  curve  from 
a  sample  of  decision  variable  data  [5] ,  have  been  developed. 

Although  the  form  of  the  three-group  ideal  observer  has  also  been  known  for  some  time  [3] , 
the  development  of  a  practical  three-group  classifier  and  a  fully  general  extension  of  ROC 
analysis  to  three-group  classification  has  proven  quite  difficult,  primarily  due  to  the  tremen¬ 
dous  increase  in  complexity  encountered  when  one  moves  from  two-group  to  three-group  clas¬ 
sification  tasks.  Briefly,  characterizing  the  performance  of  a  three-group  classifier  requires  an 
ROC  “hypersurface”  with  five  degrees  of  freedom  in  a  six-dimensional  ROC  space  [6,7]  (by 
contrast,  a  two-group  classifier  is  fully  described  by  a  simple  curve  in  a  two-dimensional  ROC 
space).  Despite  these  difficulties,  our  research  efforts  are  focused  on  the  development  of  a 
three-group  classifier  and  performance  evaluation  methodology  for  breast  lesion  classification 
in  a  mammographic  CAD  system. 

We  strongly  believe  the  development  of  such  a  three-group  classifier  to  be  of  practical  and 
not  merely  academic  importance.  In  the  past,  two  types  of  mammographic  CAD  schemes 
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have  been  investigated  at  the  University  of  Chicago:  one  for  automatically  detecting  mass 
lesions  in  mammograms  [8-12],  and  one  for  classifying  known  lesions  as  malignant  or  be¬ 
nign  [13-17].  Combining  these  two  types  of  CAD  scheme  is  inherently  difficult,  because 
the  output  of  the  detection  scheme  will  necessarily  include  FP  computer  detections  in  ad¬ 
dition  to  the  malignant  and  benign  lesions  to  be  classified.  These  FP  computer  detections 
correspond  to  objects  which  were  by  design  not  included  in  the  training  sample  of  the  classi¬ 
fication  scheme,  because  they  are  not  members  of  the  data  population  (benign  and  malignant 
mass  breast  lesions)  for  which  the  classification  scheme  was  created.  It  is  clear  then  that 
the  detection  scheme’s  output  cannot  be  used  unmodified  as  the  input  to  the  classification 
scheme. 

Our  approach  has  been  to  treat  this  problem  explicitly  as  a  three-group  classification 
task.  That  is,  the  output  of  the  detection  scheme  should  be  classified  as  malignant  lesions, 
benign  lesions,  and  non-lesions  (FP  computer  detections),  and  the  classifier  to  be  estimated 
is  the  ideal  observer  decision  function  for  this  task.  If  successful,  this  approach  would  allow 
radiologists  to  identify  more  malignant  lesions  without  increasing  biopsy  rates  for  patients 
without  malignancy. 

Our  approved  Statement  of  Work  is  as  follows: 

Task  1.  Develop  a  three-group  classifier  for  clustered  microcalcifications  in  mammograms,  Months 
1-12. 

(a)  Collect  cases  containing  180  malignant  and  180  benign  clusters  of  microcalcifica¬ 
tions. 

(b)  Determine  truth  state  of  imaged  lesions  by  reviewing  the  images,  radiologist  re¬ 
ports,  and  pathology  reports  for  these  cases. 

(c)  Obtain  at  least  180  FP  computer  detections  from  these  cases  using  the  existing 
detection  scheme. 

(d)  Train  and  test  a  three-group  classifier  on  these  lesions,  using  methodology  we 
previously  developed  for  mass  lesions. 

Task  2.  Design  and  develop  an  interface  for  an  intelligent  workstation  for  CAD,  Months  11-14. 

(a)  Examine  the  most  useful  features  of  the  interface  of  the  existing  intelligent  CAD 
workstation  for  mammographic  lesion  detection. 

(b)  Examine  the  most  useful  features  of  the  interface  of  the  existing  CAD  schemes  in 
our  laboratory  for  classifying  manually  detected  lesions  as  malignant  or  benign. 

(c)  Develop  a  simple  interface  drawing  on  the  advantages  of  the  existing  detection 
and  classification  schemes,  extended  to  the  three-group  classification  task. 

(d)  Test  the  interface  with  non-radiologist  observers  in  our  laboratory  familiar  with 
the  goals  of  CAD  and  with  interface  design  principles. 

Task  3.  Design  and  perform  a  pilot  observer  study  measuring  radiologists’  performances  using 
the  three-group  classification  schemes  and  traditional  two-group  classification  schemes, 
Months  15-24. 

(a)  Recruit  radiologists  from  our  institution  and  neighboring  institutions. 
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(b)  Provide  training  to  the  radiologists  in  the  use  of  the  intelligent  CAD  workstation 
interfaces. 

(c)  Measure  radiologist  performance  using  the  three-group  intelligent  workstation, 
and  using  the  existing  intelligent  workstation  for  detecting  lesions  followed  by 
manual  selection  of  lesions  to  be  analyzed  by  the  existing  schemes  for  two-group 
classification  of  lesions. 

Task  4.  Develop  techniques  to  compare  radiologists’  performance  in  using  the  proposed  three- 
group  and  traditional  two-group  classification  schemes,  Months  18-36. 

(a)  Develop  methodology  to  extend  two-group  ROC  analysis  to  tasks  in  which  obser¬ 
vations  are  classified  into  three  groups. 

(b)  Develop  methodology  to  determine  the  statistical  significance  of  measured  differ¬ 
ences  in  performance  between  three-group  classifiers. 

(c)  Use  this  methodology  to  analyze  the  observer  data  obtained  in  Task  3. 

For  Tasks  1(a)  and  1(b),  we  have  collected  a  database  of  134  mammographic  cases,  four 
standard  views  per  case;  the  majority  of  these  cases  contain  malignant  or  benign  clustered 
microcalcification  lesions.  The  truth  for  the  malignant  microcalcification  lesions  is  verified 
by  pathology  report,  and  that  for  the  benign  lesions  by  pathology  report  when  biopsy  was 
recommended,  and  by  followup  when  that  was  recommended  by  the  original  radiologist. 
This  is  less  than  the  number  of  malignant  and  benign  lesions  initially  proposed  for  this 
project,  but  we  will  have  the  opportunity  to  supplement  these  with  further  such  cases  from 
the  database  of  a  colleague  in  our  laboratories. 

For  Tasks  1(c)  and  1(d),  we  initially  encountered  difficulties  porting  the  computer  code 
for  the  existing  detection  scheme  from  the  legacy  equipment  for  which  it  was  written  (IBM 
RISC  6000  machines,  whose  operating  systems  are  no  longer  supported  and  whose  hardware 
is  too  old  to  be  considered  reliable)  to  a  modern  PC  workstation  running  a  Linux  operating 
system.  These  difficulties  were  traced  to  compiler  incompatibilities  between  the  two  systems. 
A  computer  programmer  in  our  laboratory  with  extensive  experience  with  both  systems 
and  intimate  familiarity  with  the  internals  of  the  detection  scheme  has  investigated  and 
eliminated  the  majority  of  these.  It  is  anticipated  that  completion  of  Task  1  will  require 
another  quarter  year  of  effort. 

Our  research  accomplishments  to  date  have  focused  largely  on  Task  4.  Although  the 
“methodology  we  previously  developed  for  mass  lesions”  [18]  was  successful  for  estimating 
ideal  observer  decision  variables  based  on  lesion  feature  data,  a  practical  classifier  to  make 
use  of  this  decision  variable  data  has  not  yet  been  implemented.  As  the  difficulties  in  theo¬ 
retically  characterizing  the  behavior  of  such  a  three-group  classifier  are  intimately  related  to 
evaluation  of  such  a  classifier’s  performance  (i.  e.,  the  development  of  a  three-group  extension 
to  ROC  analysis),  such  a  reordering  of  the  approved  tasks  seems  logically  justified. 

We  investigated  in  great  detail  the  behavior  of  the  three-group  ideal  observer.  In  partic¬ 
ular,  it  is  well-known  that  the  three-group  ideal  observer  makes  decisions  by  partitioning  a 
plane  of  two  decision  variables  into  three  regions  using  three  decision  boundary  lines  [3].  We 
showed  that  the  locations  and  orientations  of  these  decision  boundary  lines  are  not  arbitrary; 
given  the  slopes  and  ^-intercepts,  for  example,  of  two  of  the  lines,  those  of  the  third  line  are 
constrained  to  lie  within  a  particular  range  of  values  [19].  (See  Appendix  A.)  A  detailed 
understanding  of  such  properties  of  the  three-group  ideal  observer  will  prove  crucial  to  the 
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calculation  of  observer  ROC  operating  points,  and  by  extension  to  observer  performance 
evaluation  in  general. 

In  onr  efforts  to  develop  a  three-group  classifier  and  appropriate  performance  evaluation 
methodology,  we  have  made  every  attempt  to  keep  onr  analysis  as  general  as  possible  de¬ 
spite  the  theoretical  difficulties  this  entails.  Other  researchers  have  proposed  three-group 
methodology  by  considering  observers  whose  behavior  is  restricted  in  particular  ways,  or  by 
considering  only  a  subset  of  the  possible  performance  characterization  indices  (the  axes  of 
ROC  space),  or  both  [20-24],  The  inherent  complexity  of  the  three-group  classification  task 
makes  direct  comparison  of  different  methods  by  different  researchers  difficult.  To  facilitate 
such  a  comparison,  we  analyzed  the  different  methods  in  terms  of  the  three-group  ideal 
observer  [25].  (See  Appendix  B.)  In  addition  to  providing  us  with  valuable  insight  and 
experience  in  comparing  different  classifiers,  which  should  ultimately  prove  directly  relevant 
to  the  completion  of  Task  4,  this  work  also  enabled  us  to  present  to  the  observer  performance 
and  CAD  research  communities  a  useful  framework  within  which  comparison  of  superficially 
very  different  classifiers  can  readily  be  made.  A  poster  presentation  of  the  theoretical  results 
of  this  and  the  preceding  paragraph,  as  well  as  our  research  accomplishments  during  the  first 
year  of  this  award,  was  made  at  the  2005  US  DOD  Breast  Cancer  Research  Program  Era  of 
Hope  Meeting  in  Philadelphia,  PA  [26]. 

Most  recently,  we  analyzed  a  simplified  performance  evaluation  method  (he.,  an  extension 
of  ROC  analysis  to  tasks  with  three  groups)  which  considers  only  the  three  “sensitivities”  of 
the  observer  —  the  three  probabilities  of  correctly  identifying  an  observation  from  one  of  the 
three  respective  groups.  (This  can,  in  general,  be  expected  to  yield  an  incomplete  description 
of  observer  performance,  which  requires  a  set  of  six  conditional  classification  probabilities  [7].) 
This  method  was  originally  proposed  by  Mossman  [22]  for  a  pair  of  essentially  ad  hoc  decision 
rules  and  arbitrary  decision  variables,  and  more  recently  advocated  by  He  et  al.  [24]  for  a  set 
of  ideal  observer  decision  variables  and  a  decision  rule  shown  [24,25,27]  to  be  a  special  case  of 
the  ideal  observer  decision  rule,  and  also  shown  [25,27]  to  be  a  special  case  of  the  decision  rule 
proposed  by  Scurheld  [21].  We  were  able  to  derive  a  more  fundamental  motivation  for  the 
decision  rules  described  in  those  works,  given  the  simplified  performance  description  in  terms 
of  only  the  sensitivities,  by  applying  previously  successful  Neyman-Pearson  optimization 
methodology  [3, 7]  to  this  restricted  performance  evaluation  strategy. 

Simply  put,  assuming  that  one  chooses  to  measure  observer  performance  only  in  terms 
of  the  observer’s  sensitivities,  we  proved  [28]  that  the  optimal  observer  with  respect  to  this 
metric  is  in  fact  the  special  case  of  the  ideal  observer  proposed  by  He  etal.  [24],  (See 
Appendix  C.)  We  then  applied  this  analysis  technique  [29]  to  other  decision  strategies  and 
performance  evaluation  strategies  which  we  had  previously  analyzed  in  terms  of  the  ideal 
observer  decision  rule  [25].  (See  Appendix  D.)  Given  the  difficulties  inherent  in  a  fully  general 
description  of  three-class  ideal  observer  behavior  and  performance  evaluation,  it  is  possible 
that  a  restricted  or  simplified  model,  similar  to  those  proposed  already  by  other  researchers, 
may  ultimately  prove  of  greater  practical  value  than  the  fully  general  theoretical  model. 
We  consider  this  work  important,  because  it  provides  a  principled  theoretical  framework  in 
which  to  evaluate  and  compare  such  restricted  and  simplified  models. 

A  detailed  understanding  of  the  properties  of  the  general  three-group  ideal  observer,  and 
of  the  restricted  and  simplified  models  described  above,  will  prove  crucial  to  the  calculation 
of  observer  ROC  operating  points,  and  by  extension  to  observer  performance  evaluation  in 
general.  Since  the  initiation  of  funding  for  this  project,  the  principal  investigator  and  mentor 
have  been  holding  regular  meetings  to  discuss  the  theoretical  challenges  posed  by  this  project 
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and  to  explore  possible  ways  of  overcoming  those  challenges. 


3  Key  Research  Accomplishments 

•  Detailed  investigation  of  the  relationships  among  the  decision  boundary  lines  used  by 
the  three-group  ideal  observer  (Appendix  A) 

•  Analysis  of  several  proposed  three-group  classification  methods  in  the  literature  in 
terms  of  the  three-group  ideal  observer  (Appendix  B) 

•  Development  of  principled  theoretical  motivation  for  proposed  three-group  classifica¬ 
tion  methods  given  selection  of  restricted  or  simplified  three-group  evaluation  method¬ 
ology  (Appendices  C,  D) 

4  Reportable  Outcomes 

•  Collection  of  database  of  134  mammographic  cases  containing  malignant  and  benign 
clustered  microcalcification  lesions,  with  truth  determined  by  pathology  (for  biopsied 
lesions)  or  mammographic  followup  (benign  lesions  only) 

•  Porting  of  existing  computerized  scheme  for  detecting  clustered  microcalcifications  in 
mammograms  from  legacy  computer  systems  no  longer  in  operation  to  workstations 
currently  in  use  for  this  project 

•  D.  C.  Edwards  and  C.  E.  Metz,  “Restrictions  on  the  three-class  ideal  observer’s  decision 
boundary  lines,”  IEEE  Trans.  Med.  Irriag. ,  vol.  24,  pp.  1566-1573,  2005. 

•  D.  C.  Edwards  and  C.  E.  Metz,  “Analysis  of  proposed  three-class  classification  decision 
rules  in  terms  of  the  ideal  observer  decision  rule,”  J.  Math.  Psychol ,  2005,  (accepted 
for  publication  5/25/06). 

•  D.  C.  Edwards,  C.  E.  Metz,  R.  M.  Nishikawa,  and  M.  L.  Giger,  “Investigation  of 
three-group  classifiers  to  fully  automate  detection  and  classification  of  breast  lesions 
in  computer-aided  diagnosis  for  mammography,”  US  DOD  Breast  Cancer  Research 
Program  Era  of  Hope  Meeting,  Philadelphia,  PA,  2005. 

•  D.  C.  Edwards  and  C.  E.  Metz,  “Optimization  of  an  ROC  hypersurface  constructed 
only  from  an  observer’s  within-class  sensitivities,”  in  Proc.  SPIE  Vol.  6146  Medical 
Imaging  2006:  Image  Perception,  Observer  Performance,  and  Technology  Assessment, 
Yulci  Jiang  and  Miguel  P.  Eckstein,  Eds.,  SPIE,  Bellingham,  WA,  2006,  pp.  61460A1- 
61460A7. 

•  D.  C.  Edwards  and  C.  E.  Metz,  “Optimization  of  restricted  ROC  surfaces  in  three-class 
classification  tasks,”  IEEE  Trans.  Med.  Imag.,  2006,  (submitted). 


5  Conclusions 


During  the  past  year,  with  the  assistance  of  colleagues  in  our  laboratory,  we  have  collected  a 
database  of  134  mammographic  cases  containing  malignant  and  benign  clustered  microcalci¬ 
fication  lesions,  with  truth  determined  by  pathology  (for  biopsied  lesions)  or  mammographic 
followup  (benign  lesions  only),  and  we  have  ported  the  existing  computerized  scheme  for 
detecting  clustered  microcalcifications  in  mammograms  from  legacy  computer  systems  no 
longer  in  operation  to  workstations  currently  in  use  for  this  project. 

We  have  continued  to  advance  our  theoretical  understanding  of  the  three-group  ideal 
observer  and  methods  of  evaluating  its  performance.  We  showed  that  the  three  decision 
boundary  lines  used  by  the  three-group  ideal  observer  are  not  arbitrary,  but  are  intricately 
related  to  one  another.  We  analyzed  several  recently  proposed  three-group  classification 
methods  in  terms  of  the  three-group  ideal  observer.  We  reported  on  the  important  theoretical 
results  we  had  devloped  to  date  at  the  2005  Breast  Cancer  Research  Program  Era  of  Hope 
Meeting.  Finally,  we  developed  principled  theoretical  motivations  for  various  proposed  three- 
group  classification  methods,  given  in  each  case  the  selection  of  a  restricted  or  simplified 
three-group  evaluation  methodology. 

Although  our  primary  research  accomplishments  have  been  theoretical,  they  are  crucial 
steps  in  the  development  of  a  practical  three-group  classifier  and  a  fully  general  three-group 
performance  evaluation  methodology.  Despite  the  considerable  difficulties  involved  in  such 
development,  a  CAD  scheme  incorporating  a  three-group  classifier  as  we  propose  could  po¬ 
tentially  allow  radiologists  to  detect  more  malignant  breast  lesions  without  increasing  their 
FP  biopsy  rate.  We  believe  this  goal  to  be  worth  the  necessary  effort  on  our  part. 
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Abstract — We  are  attempting  to  develop  expressions  for  the  co¬ 
ordinates  of  points  on  the  three-class  ideal  observer’s  receiver  op¬ 
erating  characteristic  (ROC)  hypersurface  as  functions  of  the  set 
of  decision  criteria  used  by  the  ideal  observer.  This  is  considerably 
more  difficult  than  in  the  two-class  classification  task,  because  the 
conditional  probabilities  in  question  are  not  simply  related  to  the 
cumulative  distribution  functions  of  the  decision  variables,  and  be¬ 
cause  the  slopes  and  intercepts  of  the  decision  boundary  lines  are 
not  independent;  given  the  locations  of  two  of  the  lines,  the  location 
of  the  third  will  be  constrained  depending  on  the  other  two.  In  this 
paper,  we  attempt  to  characterize  those  constraining  relationships 
among  the  three-class  ideal  observer’s  decision  boundary  lines.  As 
a  result,  we  show  that  the  relationship  between  the  decision  criteria 
and  the  misclassification  probabilities  is  not  one-to-one,  as  it  is  for 
the  two-class  ideal  observer. 

Index  Terms — Ideal  observers,  ROC  analysis,  three-class  classi¬ 
fication. 


I.  Introduction 

RECEIVER  operating  characteristic  (ROC)  analysis  is  the 
accepted  methodology  for  analyzing  the  performance  of 
a  two-class  classifier  [1],  in  particular  for  medical  decision¬ 
making  tasks  in  which  a  patient  is  diagnosed  as  having  or  not 
having  a  particular  condition  based  on  features  of  a  medical 
image  [2],  In  judging  the  performance  of  an  observer  measured 
via  ROC  analysis,  the  standard  for  comparison  is  the  so-called 
ideal  observer,  that  observer  which  outperforms  any  other  pos¬ 
sible  observer  given  the  statistical  variability  of  the  observa¬ 
tional  data  being  classified  [1],  [3].  Although  the  general  form 
of  the  ideal  observer  in  a  classification  task  with  three  or  more 
classes  has  been  known  for  some  time  [3],  the  considerable  com¬ 
plexities  inherent  to  this  model  compared  to  the  two-class  clas¬ 
sification  task  have  hampered  the  development  of  extensions 
of  ROC  analysis  which  are  both  fully  general  and  practically 
useful.  (Several  researchers  have  recently  proposed  restricted 
observer  models  or  restricted  evaluation  methods  [4] — [7].) 

Despite  these  difficulties,  research  continues  in  this  area  be¬ 
cause  the  advantages  to  be  gained  from  a  three-class  classifier 
and  appropriate  evaluation  methodology  are  considerable.  In 
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our  own  case,  we  seek  to  combine  existing  computer-aided  di¬ 
agnosis  (CAD)  schemes  for  detecting  [8]— [12]  mammographic 
mass  lesions  and  classifying  [13]— [17]  them  as  malignant  or  be¬ 
nign.  The  combined  scheme  would  serve  as  a  fully  automated 
classifier  (the  existing  classifier  requires  initial  manual  identifi¬ 
cation  of  lesions  by  a  radiologist),  potentially  allowing  radiolo¬ 
gists  to  reduce  their  false-positive  biopsy  rate  without  reducing 
their  sensitivity  for  detection  of  malignancies.  Simply  concate¬ 
nating  the  two  types  of  scheme  in  a  two-stage  classifier  would  be 
inadequate,  because  the  output  of  the  detection  scheme  will  nec¬ 
essarily  include  false-positive  (FP)  computer  detections  in  addi¬ 
tion  to  the  malignant  and  benign  lesions  to  be  classified.  These 
FP  computer  detections  correspond  to  objects  which  were  by 
design  not  included  in  the  training  sample  of  the  classification 
scheme,  because  they  are  not  members  of  the  data  population 
(benign  and  malignant  mass  breast  lesions)  for  which  the  clas¬ 
sification  scheme  was  created.  It  is  clear  then  that  the  detection 
scheme’s  output  cannot  be  used  unmodified  as  the  input  to  the 
classification  scheme. 

Our  initial  efforts  toward  the  goal  of  developing  a  true 
three-class  classifier  have  been  more  theoretical  than  practical 
so  far.  We  have  shown  that,  just  as  the  two-class  ideal  observer 
achieves  the  optimal  two-class  ROC  curve  for  a  given  task, 
the  A-class  ideal  observer  achieves  the  optimal  A-class  ROC 
hypersurface  [18].  (Note  that  the  ideal  observer  is  formally 
defined  as  that  which  minimizes  the  expected  Bayes  risk  [3], 
and  not  in  terms  of  classification  performance,  making  this 
a  nontrivial  observation  in  both  cases.)  More  soberingly,  we 
found  recently  that  an  obvious  generalization  of  the  well-known 
performance  metric,  the  area  under  the  ROC  curve  (AUC),  is 
not  a  useful  performance  metric  in  a  classification  task  with 
three  or  more  classes  [19]. 

At  present  we  are  attempting  to  develop  expressions  for  the 
coordinates  of  points  on  the  three-class  ideal  observer’s  ROC 
hypersurface  (the  conditional  probabilities  for  misclassifying 
observations  [18],  [20],  [21])  as  functions  of  the  set  of  decision 
criteria  used  by  the  ideal  observer.  This  is  considerably  more 
difficult  than  in  the  two-class  classification  task  for  two  reasons. 
First,  the  conditional  probabilities  in  question  are  not  simply  re¬ 
lated  to  the  cumulative  distribution  functions  (cdfs)  of  the  deci¬ 
sion  variables,  but  are  integrals  of  those  variables  over  domains 
determined  by  three  decision  boundary  lines  [3].  Second,  the 
slopes  and  intercepts  of  the  decision  boundary  lines  are  not  inde¬ 
pendent;  given  the  locations  of  two  of  the  lines,  we  have  found 
recently  that  the  location  of  the  third  will  be  constrained  de¬ 
pending  on  the  other  two. 

In  this  paper,  we  attempt  to  characterize  the  constraining  rela¬ 
tionships  just  mentioned  among  the  three-class  ideal  observer’s 
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decision  boundary  lines.  Although  this  paper  is  admittedly  still 
removed  from  image  analysis  perse,  we  hope  it  may  prove  of 
interest  to  the  CAD  community  and  ultimately  of  relevance  to  a 
wide  variety  of  medical  image  analysis  tasks.  In  the  next  section 
we  briefly  review  the  structure  of  the  three-class  ideal  observer 
and  the  notation  we  have  been  using  to  characterize  it  [18].  In 
Section  III,  we  show  that  for  a  given  location  (slope  and  y- inter¬ 
cept)  of  the  decision  boundary  line  separating  the  first  and  third 
classes,  the  location  of  one  of  the  remaining  two  lines  is  con¬ 
strained  in  a  particular  way  based  on  the  location  of  the  other. 

These  results  are  discussed  in  Section  IV.  Given  the  arbitrari¬ 
ness  of  the  labels  applied  to  the  three  classes  (ie,  which  classes 
are  considered  first,  second,  or  third),  one  would  expect  the  se¬ 
lection  of  the  fixed  line  in  Section  III  to  be  similarly  arbitrary, 
and  indeed  in  Appendices  A  and  B  we  show  that  corresponding 
and  consistent  results  are  obtained  if  one  takes  the  location  of 
the  decision  boundary  line  separating  the  second  and  third,  or 
first  and  second,  classes,  respectively,  to  be  given. 

II.  The  Three-Class  Ideal  Observer 

In  [18],  we  showed  that  an  .V-class  ideal  observer  makes  de¬ 
cisions  by  partitioning  a  likelihood  ratio  decision  variable  space, 
where  the  boundaries  of  the  partitions  are  given  by  hyperplanes 

tv— 1 

decide  d  =  tt;  iff  ^  {U^  -  Uj\k)P{t  =  7rfc)LRfc 

k= 1 

>  (Uj\N  -  Ui[N)P(t  =  7rN )  {j  <  i}  (1) 

N-l 

and  ^  (Uj\k  -  Uj\k)P(t  =  7rfc)LRfc 

k= 1 

>  (^j|tv  -  Ui\N)P(t  =  7rN )  { j  >  *}.  (2) 

Here,  Lyy  is  the  utility  of  deciding  an  observation  is  from  class 
7 Ti  given  that  it  is  actually  from  class  717 ;  P(t  =  Tik)  is  the  apriori 
probability  that  an  observation  is  drawn  from  class  77;  and  LR/, 
is  the  A:th  likelihood  ratio,  defined  by  the  ratio p(x\nk) /p(x\ttn) 
of  the  probability  density  functions  of  the  observational  data 
(We  use  boldface  type  to  denote  random  variables).  The  par¬ 
titioning  is  determined  by  the  parameters 


' Yijk  —  (P i\k  Pj\k)  —  TTfc)  (3) 

with  i,  j,  and  k  varying  from  1  to  N,  and  j  i.  Note  that  these 
parameters  are  not  independent,  however,  because 


T/./t'  —  7fcjfc  7fcifc  •  (4) 

We  can  impose  the  reasonable  condition  that  the  utility  for 
correctly  classifying  an  observation  from  a  given  class  should  be 
greater  than  any  utility  for  incorrectly  classifying  an  observation 
from  the  same  class,  i.e.,  U,;i;  >  Uj\i  {i  ^  j}.  This  gives,  for 

j  +  h 


liji  >0  (5) 

leaving  N(N  —  1)  positive  parameters  (the  rest  are  derivable 
from  (4)). 

Finally,  note  that  the  hyperplanes  represented  by  (1)  and  (2) 
are  unchanged  if  we  multiply  all  of  these  equations  by  a  single 


scalar,  such  as  1  This  leaves  us  with  N2  —  N  —  1 

degrees  of  freedom,  as  expected. 

The  behavior  of  a  three-class  ideal  observer  is  completely 
determined  by  the  three  decision  boundary  lines 


7l2lLRl  —  7212LR2  =7313  —  7323  (6) 

7131  LRl  +  (7232  —  7212)LR2  =7313  (7) 

(7131  —  7l2l)LRi  +  7232 LR2  =7323  (8) 


which  we  call,  respectively,  the  “l-vs-2”  line,  the  “l-vs-3”  line, 
and  the  “2-vs-3”  line.  Note  that  if  any  two  of  these  lines  inter¬ 
sect,  the  third  line  must  also  share  this  intersection  point.  We 
also  emphasize  the  simple  interpretation,  from  (3),  of  each  of  the 
7 iji  parameters  appearing  in  these  decision  boundary  line  equa¬ 
tions  as  the  difference  in  utilities  between  a  “correct”  and  one 
particular  “incorrect”  decision  (scaled  by  the  apriori  probability 
of  the  true  class  in  question);  and  of  each  difference  in  the  7 ^ 
parameters  as  a  difference  in  utilities  between  two  possible  “in¬ 
correct”  decisions  [again  scaled  by  the  apriori  probability  of  the 
true  class  in  question;  e.g„  7313  -  7323  =  (C/2|3  -  Ui\3)P(t  = 
713)]- 

From  the  conditions  on  the  7 77  parameters  in  (5),  we  can 
readily  derive  conditions  on  the  decision  boundaries  themselves. 
If  we  denote  the  slope  of  the  “i-vs-j”  line  by  ml:l ,  its  y-intercept 
by  bij,  and  its  ^-intercept  by  Xij ,  we  have 


7121 

m\ 2  = - 

7212 

7313 

Xl3  ~  - - 

7131 
,  7323 

t>23  =  - 

7232 


>  0 
>  0 
>  0. 


(9) 

GO) 

(11) 


These  are  the  three  conditions  stated  in  [22]. 


III.  Restrictions  Determined  by  the  Parameters  of  the 
“1-VS.-3”  Line 


Constraints  on  the  decision  boundaries,  in  addition  to  those 
given  in  (9)-(l  1),  can  be  obtained  by  considering  the  two  cases 
7232  -  7212  >  0  and  7232  -  7212  <  0.  In  the  first  case  (ie, 
7232  >  7212,  or  E7i|2  >  U3 12),  we  have 


We  also  have 


m  13  = 
^13  = 


—7131 

<0 

(12) 

7232  —  7212 

7313 

>0. 

(13) 

7232  —  7212 

m2  3 


—(7131  ~  7121) 

7232 

(7232  ~  7212)11113  +  721271112 
7232 


7212  \  .  7212 

-  mi3  -I - mi2 

7232  J  7232 


(14) 


This  is  a  weighted  sum  of  the  slopes  m.12  and  m.13,  where  the 
weights  are  positive  and  sum  to  one.  Since  we  must  have  rn.  \  3  < 
rri  \  9  from  (9)  and  (12),  it  must  therefore  be  the  case  that 


mi3  <  m2 3  <  mi2. 


(15) 


1568 


IEEE  TRANSACTIONS  ON  MEDICAL  IMAGING,  VOL.  24,  NO.  12,  DECEMBER  2005 


Fig.  1.  Example  ideal  observer  decision  rules  for  the  case  7232  —  7212  >  0 
(implying  m 13  <  0  and  613  >  0)  and  612  <  0.  In  (a),  X12  <  Xi3>  and 
the  “2-vs-3”  line  can  lie  anywhere  between  the  two  dashed  lines  shown  (the 
region  between  the  lower  dashed  and  dotted  lines  is  excluded  because  623  >0); 
observations  in  the  unlabeled  region  above  this  line  will  be  decided  “7r2 and 
those  below  this  line  will  be  decided  “7r3.”  In  (b),  X12  ^  X13  and  the  “2-vs-3” 
line  can  lie  anywhere  in  the  unlabeled  region  (provided  it  shares  the  intersection 
point  of  the  “l-vs-2”  and  “l-vs-3”  lines  shown);  observations  above  this  line  will 
be  decided  “7r2 and  those  below  this  line  will  be  decided  “7r3 


We  now  consider  the  case  7232  —  7212  <  0  (ie,  7232  <  7212* 
or  Ui |2  <  C/3|2),  which  yields 


We  now  have 


mis  = 
bi3  = 


-7131 

>  0 

(18) 

7232  -  7212 

7313 

<  0. 

(19) 

7232  —  7212 

m  12 


7121 

7212 

7131  —  (7131  —  7l2l) 

7212 

—  (7232  ~  7212)l?ll3  +  723211123 
7212 


7232  \  .  7232 

-  m  13  H - m23 

7212  J  7212 


(20) 


Fig.  2.  Example  ideal  observer  decision  rules  for  the  case  7232  -7212  >  0 
(implying  m13  <  0  and  613  >  0)  and  fr12  >  0.  In  (a),  612  <  b13,  and  the 
“2-vs-3”  line  can  lie  anywhere  in  the  unlabeled  region;  observations  above  this 
line  will  be  decided  “7r2,”  and  those  below  this  line  will  be  decided  “7r3.”  In 
(b),  6i2  >  6i3  and  the  “2-vs-3”  line  can  lie  anywhere  between  the  “l-vs-2”  and 
“l-vs-3”  lines  (provided  it  shares  their  intersection  point);  note  that  observations 
in  this  region  will  be  decided  “7Ti”  regardless  of  the  position  of  this  line. 


Furthermore 


This  is  again  a  weighted  sum  in  which  the  weights  are  positive 
and  sum  to  one,  giving 


min(mx3,m23)  <  mi 2  <  max(mi3,m23).  (21) 


Furthermore 


bi2 


7313  -  7323 
-7212 

—7313  +  7323 
7212 

—  (7232  —  7212)^13  +  7232^23 
7212 


b\  3  + 


7232 

7212 


b‘23- 


(22) 


This  is  a  weighted  sum  of  the  ^-intercepts  ht  3  and  623,  where  the 
weights  are  positive  and  sum  to  one;  thus,  in  addition  to  (2 1 ),  we 
have  the  condition 


&23  — 


7323 

7232 

7313  -  (7313  —  7323) 


7232 

(7232  ~  7212)^13  +  7212fol2 
7232 


=  (1  _^U,  +  2*E 


7232  J 


b\  3  H - Oi2- 

7232 


06) 


This  is  a  weighted  sum  of  the  y- intercepts  612  and  &13,  where  the 
weights  are  positive  and  sum  to  one;  thus,  in  addition  to  (15),  we 
have  the  condition 


min(&i2, 613)  <  623  <  max(&i2, 613).  (17) 


C>13  <  bi2  <  623  (23) 

since  613  <  &23  by  (11)  and  (19). 

If  m23  <  0,  then  (21)  immediately  reduces  to  m2 3  <  m i2  < 
mi3  (by  (18),  we  are  considering  a  special  case  in  which  77113  > 
0).  This  is  illustrated  in  Fig.  3  for  the  slightly  different  situations 
X\  3  <  X23  and  Xi3  >  X23-  If,  on  the  other  hand,  m23  >  0,  then 
(21)  and  (23)  together  imply  two  possible  situations,  depending 
on  whether  m23  <  mi3  or  m2 3  >  mi3-  These  possibilities  are 
illustrated  in  Fig.  4. 

One  may  of  course  ask  what  happens  when  ^232  —  7212  =  0 
(ie,  7232  =  72i2,  or  C/i|2  =  U^)-  In  this  case,  both  771.13  and 
613  are  infinite.  Furthermore 


If  bi2  <  0,  then  (17)  immediately  reduces  to  /i!  2  <  623  <  h  t  3 
(by  (13),  we  are  considering  a  special  case  in  which  bi3  >  0). 
This  is  illustrated  in  Fig.  1  for  the  slightly  different  situations 
X12  <  Xi3  and  X\ 2  >  Xi3-  If,  on  the  other  hand,  612  >  0,  then 
(15)  and  (17)  together  imply  two  possible  situations,  depending 
on  whether  b\2  <  613  or  &i2  >  613.  These  possibilities  are 
illustrated  in  Fig.  2. 


-7131  . 

= - b  mu 

7232 

<77112 


(24) 
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Fig.  3.  Example  ideal  observer  decision  rules  for  the  case  7232  —  7212  <  0 
(implying  77713  >  0  and  613  <  0)  and  m23  <  0.  In  (a),  X13  <  X23,  and  the 
“l-vs-2”  line  can  lie  anywhere  between  the  two  dashed  lines  shown  (the  region 
between  the  lower  dashed  and  dotted  lines  is  excluded  because  mi2  >  0); 
observations  in  the  unlabeled  region  above  this  line  will  be  decided  “7r2 and 
those  below  this  line  will  be  decided  “7Ti.”  In  (b),  X13  >  X23  and  the  “l-vs-2” 
line  can  lie  anywhere  in  the  unlabeled  region  (provided  it  shares  the  intersection 
point  of  the  “l-vs-3”  and  “2-vs-3”  lines  shown);  observations  above  this  line  will 
be  decided  “7T2”,  and  those  below  this  line  will  be  decided  “7Ti” 


LRi  LRi 

(a)  (b) 


Fig.  4.  Example  ideal  observer  decision  rules  for  the  case  7232  —  7212  <  0 
(implying  mi3  >  0  and  613  <  0)  and  m23  >  0.  In  (a),  77223  <  77213 ,  and  the 
“l-vs-2”  line  can  lie  anywhere  in  the  unlabeled  region;  observations  above  this 
line  will  be  decided  “^2”,  and  those  below  this  line  will  be  decided  In  (b), 
77223  >  77213 ,  and  the  “l-vs-2”  line  can  lie  anywhere  between  the  “l-vs-3”  and 
“2-vs-3”  lines  (provided  it  shares  their  intersection  point);  note  that  observations 
in  this  region  will  be  decided  “773”  regardless  of  the  position  of  this  line. 


and 


7323  ~  7313 
7212 

7323  —7313 
7232  7212 


—  ^23  + 


—7313 

7212 


<  ^23- 


(25) 


Together,  (24)  and  (25)  can  be  considered  either  a  special  case 
of  the  inequalities  (15)  and  (17),  if  we  take  mi.3  =  —  oo  and 
(>13  =  +oo;  or  of  the  inequalities  (21)  and  (23),  if  we  take 
mi3  =  +oo  and  613  =  —00.  This  situation,  for  the  slightly 
different  cases  612  <  0  and  h±2  >  0,  is  illustrated  in  Fig.  5. 

In  this  section,  the  possible  values  of  the  quantity  7232  —  7212 
were  considered  in  order  to  determine  properties  of  the  ideal  ob¬ 
server  decision  boundary  lines.  It  may  be  argued  that  the  choice 
of  a  parameter  from  the  “l-vs-3”  line,  i.e.,  one  of  the  three  avail¬ 
able  lines,  must  be  an  arbitrary  one.  In  fact,  we  may  consider 
taking  another  parameter  (or  combination  of  parameters)  from 
(6)-(8),  and  using  it  to  determine  conditions  on  the  properties 


(a)  (b) 


Fig.  5.  Example  ideal  observer  decision  rules  for  the  case  7232  —  7212  =  0 
(implying  77713  =  4=00  and  613  =  ±00).  In  (a),  612  <  0  and  the  “2-vs-3”  line 
can  lie  anywhere  between  the  two  dashed  lines  shown  (the  region  between  the 
lower  dashed  and  dotted  lines  is  excluded  because  623  >0);  observations  in  the 
unlabeled  region  above  this  line  will  be  decided  “7r2 ,”  and  those  below  this  line 
will  be  decided  “7T3.”  In  (b),  612  >  0  and  the  “2-vs-3”  line  can  lie  anywhere 
in  the  unlabeled  region;  observations  above  this  line  will  be  decided  “7r2,”  and 
those  below  this  line  will  be  decided  “^3 .” 

of  the  decision  boundary  lines  as  above.  Given  that  all  possible 
values  of  the  quantity  7232  —7212  were  considered,  it  is  expected 
that  no  new  conditions  should  be  determinable  (let  alone  con¬ 
ditions  inconsistent  with  those  already  determined).  In  fact,  this 
can  readily  be  shown  to  be  the  case;  however,  due  to  the  repet¬ 
itive  nature  of  the  derivations  involved,  these  are  relegated  to 
Appendices  A  and  B. 

IV.  Discussion  and  Conclusion 

The  repetitive  nature  of  the  algebraic  manipulations  given  in 
the  preceding  section  and  the  Appendices  should  not  be  allowed 
to  distract  from  the  fundamental  point  being  made:  given  the 
locations  of  two  of  the  decision  boundary  lines,  the  location 
of  the  third  is  not  completely  arbitrary.  That  is,  aside  from  the 
obvious  [given  (6)— (8)]  constraint  that  the  lines  must  share  a 
common  intersection  point,  it  can  also  be  shown  that  the  slope 
of  the  third  line  is  constrained  by  the  slopes  of  the  first  two. 

The  significance  of  this  result  may  be  difficult  to  appreciate 
at  first  glance.  It  is  perhaps  best  illustrated  by  comparison  with 
the  two-class  classifier,  for  which  the  ROC  operating  point  coor¬ 
dinates  [e.g.,  the  true-positive  fraction  (TPF)  and  false-positive 
fraction  (FPF)]  are  determined  by  a  single  decision  criterion  7, 
which  is  free  to  vary  without  restriction  throughout  its  domain 
of  definition.  For  the  two-class  ideal  observer,  in  particular,  an 
observation  is  decided  “positive”  (assigned  to  the  class  7Ti)  if 
LRi  >  7,  where  7  can  take  on  any  nonnegative  value.  Further¬ 
more,  the  FPF  and  TPF  are  related  in  a  very  simple  way  to  the 
cdfs  of  LRi ,  and  are  thus  monotonic  in  the  decision  criterion  7. 
For  the  three-class  ideal  observer,  this  straightforward  relation¬ 
ship  is  lost;  indeed.  Figs.  2(b),  4(b),  7(b),  9(b),  12(b),  and  14(b) 
show  that  for  certain  values  of  four  of  the  five  decision  criteria 
7 iji,  the  misclassification  probabilities  (ie,  the  ROC  operating 
point  coordinates)  can  be  independent  of  the  fifth  decision  cri¬ 
terion. 

More  succinctly,  the  relationship  between  the  decision  cri¬ 
teria  and  the  misclassification  probabilities  is  not  one-to-one, 
as  it  is  for  the  two-class  ideal  observer.  A  correct  formulation 
of  the  misclassification  probabilities  as  functions  of  the  deci¬ 
sion  criteria — necessary  for  an  explicit  calculation  of  the  ideal 
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LRi  LRi 

(a)  (b) 


Fig.  6.  Example  ideal  observer  decision  rules  for  the  case  7131  —  712i  >  0 
(implying  l/m23  <  0  and  X23  >  0)  and  X12  <  0-  In  (a),  6i2  <  623,  and 
the  “l-vs-3”  line  can  lie  anywhere  between  the  two  dashed  lines  shown  (the 
region  between  the  left  dashed  and  dotted  lines  is  excluded  because  X13  >  0); 
observations  in  the  unlabeled  region  to  the  right  of  this  line  will  be  decided  “7Ti 
and  those  to  the  left  of  this  line  will  be  decided  “7r3.”  In  (b),  bi2  >  623  and  the 
“l-vs-3”  line  can  lie  anywhere  in  the  unlabeled  region  (provided  it  shares  the 
intersection  point  of  the  “l-vs-2”  and  “2-vs-3”  lines  shown);  observations  to  the 
right  of  this  line  will  be  decided  “tti  and  those  to  the  left  of  this  line  will  be 
decided  “7r3.” 


observer’s  ROC  hypersurface  given  the  decision  variable  prob¬ 
ability  density  functions — will  require  careful  consideration  of 
this  issue.  Although  we  have  shown  previously  that  the  hyper¬ 
volume  under  the  ROC  hypersurface  is  not  a  useful  performance 
metric  in  general  [19],  it  is  still  the  case  that  the  ROC  hyper¬ 
surface  in  terms  of  the  set  of  misclassification  probabilities  (six 
in  the  three-class  classification  task)  is  a  complete  description 
of  observer  performance.  We  expect  that  a  useful  performance 
metric,  assuming  one  exists,  will  be  derived  in  some  fashion 
from  the  ROC  hypersurface.  It  is  thus  important  to  develop  a 
complete  understanding  of  the  rather  complicated  relationships 
among  the  quantities  involved,  and  we  hope  that  this  paper  will 
prove  of  some  use  toward  this  goal. 


Appendix  A 

Restrictions  Determined  by  the  Parameters  of  the 
“2-VS.-3”  Line 

Consider  the  quantity  7131  —7121  from  (  8).  In  particular,  when 
7131  -  7121  >  0  (ie,  7i3i  >  7121,  or  U2 \i  >  f73|i),  we  have 


1 

m23 

X23 


—7232 

7131  —  7121 
7323 

7131  —  7121 


<  0 
>  0. 


(26) 

(27) 


Through  reasoning  similar  to  that  of  Section  III,  we  also  have 


111 

-  <  -  <  - 

m  23  mi  3  mi  2 


(28) 


LRi  LRj 

(a)  (b) 


Fig.  7.  Example  ideal  observer  decision  rules  for  the  case  7131  —  7121  >0 
(implying  l/m23  <  0  and  X23  >  0)  and  X12  >  0-  In  (a),  X12  <  X23,  and 
the  “l-vs-3”  line  can  lie  anywhere  in  the  unlabeled  region;  observations  to  the 
left  of  this  line  will  be  decided  “7Ti,”  and  those  to  the  right  of  this  line  will  be 
decided  “7r3 .”  In  (b),  X12  >  X23  and  the  “l-vs-3”  line  can  lie  anywhere  between 
the  “l-vs-2”  and  “2-vs-3”  lines  (provided  it  shares  their  intersection  point);  note 
that  observations  in  this  region  will  be  decided  “7r2”  regardless  of  the  position 
of  this  line. 


Fig.  8.  Example  ideal  observer  decision  rules  for  the  case  7131  —  7121  <0 
(implying  l/m23  >  0  and  X23  <  0)  and  l/m13  <  0.  In  (a),  623  <  ^13 » 
and  the  “l-vs-2”  line  can  lie  anywhere  between  the  two  dashed  lines  shown 
(the  region  between  the  vertical  dashed  and  dotted  lines  is  excluded  because 
m  12  >  0  and,  therefore,  l/mi2  >  0);  observations  in  the  unlabeled  region 
above  this  line  will  be  decided  “7r2 ,”  and  those  below  this  line  will  be  decided 
“717.”  In  (b),  623  >  613  and  the  “l-vs-2”  line  can  lie  anywhere  in  the  unlabeled 
region  (provided  it  shares  the  intersection  point  of  the  “l-vs-3”  and  “2-vs-3” 
lines  shown);  observations  above  this  line  will  be  decided  “7r2”,  and  those  below 
this  line  will  be  decided  “7Ti .” 


h  \  2  <  62.3  and  bi2  >  62.3-  If.  on  the  other  hand,  X12  >  0,  then 
(28)  and  (29)  together  imply  two  possible  situations,  depending 
on  whether  X12  <  X23  or  X\  2  >  X23-  These  possibilities  are 
illustrated  in  Fig.  7. 

If  7131 -7121  <  0  (ie,  7i3i  <  7i2i,orC/2|i  <  C/3|i),  we  have 

—  =  ~7232  >  0  (30) 

11123  7131  -  7121 

7323  ,,,, 

X23  =  -  <  0.  (31) 

7131  -  7121 

One  can  also  show 

min  (  — — ,  — —  )  <  — —  <  max  (  ,  — —  )  (32) 

\mi3  m23y  mi2  Vm13  "'23  / 


and 


and 


min(xi2,X23)  <  X13  <  max(xi2,  *23)-  (29) 

If  X12  <  0.  then  (29)  immediately  reduces  to  X12  <  Xi  3  < 
X23  (by  (27),  we  are  considering  a  special  case  in  which  \'23  > 
0).  This  is  illustrated  in  Fig.  6  for  the  slightly  different  situations 


X23  <  X12  <  Xi3-  (33) 

If  l/mi3  <  0,  then  (32)  immediately  reduces  to  1/Vni3  < 
l/m.12  <  1  / 77123  (by  (30),  we  are  considering  a  special  case  in 
which  l/m,23  >  0).  This  is  illustrated  in  Fig.  8  for  the  slightly 
different  situations  bo 3  <  b  13  and  b23  >  613.  If,  on  the  other 
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Fig.  9.  Example  ideal  observer  decision  rules  for  the  case  7131  —  7121  <  0 
(implying  l/m23  >  0  and  X23  <  0)  and  l/mi3  >  0.  In  (a), 

l/mi3  <  l/m23 »  and  the  “l-vs-2”  line  can  lie  anywhere  in  the  unlabeled 
region;  observations  above  this  line  will  be  decided  “7r2 and  those  below  this 
line  will  be  decided  “7Ti .”  In  (b),  l/mi3  >  l/m23  and  the  “l-vs-2”  line  can 
lie  anywhere  between  the  “l-vs-3”  and  “2-vs-3”  lines  (provided  it  shares  their 
intersection  point);  note  that  observations  in  this  region  will  be  decided  “773” 
regardless  of  the  position  of  this  line. 


(a)  (b) 

Fig.  10.  Example  ideal  observer  decision  rules  for  the  case  7131  —  7121  =  0 
(implying  l/m23  =  =|=oo  and  X23  =  ±00).  In  (a),  X12  <  0,  and  the  “l-vs-3” 
line  can  lie  anywhere  between  the  two  dashed  lines  shown  (the  region  between 
the  leftmost  dashed  and  dotted  lines  is  excluded  because  X13  >  0);  observations 
in  the  unlabeled  region  to  the  right  of  this  line  will  be  decided  “7Ti ,”  and  those 
to  the  left  of  this  line  will  be  decided  “7r3.”  In  (b),  X12  ^  0  and  the  “l-vs-3” 
line  can  lie  anywhere  in  the  unlabeled  region;  observations  to  the  right  of  this 
line  will  be  decided  “7^ ,”  and  those  to  the  left  of  this  line  will  be  decided  “7r3 .” 


values  of  the  undetermined  decision  boundary  parameter  being 
illustrated  in  that  figure).  Specifically 


Fig.  6(a) 

=> 

Figs.  2(a),  3(a),  5(b) 

Fig.  6(b) 

=> 

Fig.  2(b) 

Fig.  7(a) 

=> 

Figs.  1(a),  3(a),  5(a) 

Fig.  7(b) 

Figs.  1(b),  3(b),  5(a) 

Fig.  8(a) 

Figs.  1(a),  2(a) 

Fig.  8(b) 

Fig.  2(b) 

Fig.  9(a) 

Figs.  4(a),  5(a),  5(b) 

Fig.  9(b) 

=> 

Fig.  4(b) 

Fig.  10(a) 

Figs.  2(a),  4(a),  5(b),  2(b) 

Fig.  10(b) 

Figs.  1(a),  4(a),  5(a). 

That  is,  none  of  the  conditions  derived  in  this  section  are  in¬ 
consistent  with  those  derived  Section  III.  More  importantly,  note 
the  symmetry  between  the  corresponding  equations  and  figures 
in  Section  III  and  this  appendix,  if  one  “swaps”  the  labels  of 
classes  7ti  and  ir2,  and  additionally  replaces  niij  with  1  /m^y, 
Xij  with  bi'f,  and  bij  with  Xi'f  d'  =  1  if  i  =  2,  2  if  i  =  1,  and 
3  if  *  =  3;  similarly  for  j).  Intuitively,  if  one  “flips”  the  figures 
in  one  section  about  the  y  =  x  line,  one  obtains  the  figures  in 
the  other  section. 


Appendix  B 

Restrictions  Determined  by  the  Parameters  of  the 
“1-VS.-2”  Line 

In  this  appendix,  we  consider  the  possible  values  of  the  quan¬ 
tity  7313  —  7323-  As  in  the  preceding  Appendix,  we  expect  to 
obtain  no  conditions  inconsistent  with  those  already  derived. 

When  7313  -  j323  >  0  (ie,  7313  >  7323,  or  U2\3  >  C/i|3),  we 
have 


1 

_  —7212 

<  0 

bn 

7313  —  7323 

1 

_  7121 

>  0. 

X12 

7313  —  7323 

(36) 

(37) 


hand,  l/mi3  >  0,  then  (32)  and  (33)  together  imply  two  pos¬ 
sible  situations,  depending  on  whether  l/mi3  <  l/m23  or 
1  jm,\  3  >  l/m23-  These  possibilities  are  illustrated  in  Fig.  9. 

Finally,  we  consider  the  case  7131  —  7121  =  0  (7131  =  7121 
or  U2\i  =  C/311),  in  which  both  l/m23  and  X23  are  infinite.  We 
now  have 


mis  mi  2 


and 

X12  <  Xi3-  (35) 

Together,  (34)  and  (35)  can  be  considered  either  a  special 
case  of  the  inequalities  (28)  and  (29),  if  we  take  l/m23  =  —00 
and  X23  =  +00;  or  of  the  inequalities  (32)  and  (33),  if  we  take 
l/m23  =  +00  and  X23  =  —00.  This  situation,  for  the  slightly 
different  cases  X12  <  0  and  X12  >  0,  is  illustrated  in  Fig.  10. 

Notice  that  every  figure  in  this  appendix  has  one  or  more 
corresponding  figures  in  Section  III  (depending  on  the  possible 


Through  reasoning  similar  to  that  of  Section  III,  we  also  have 
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\X23  Xl2 


(38) 


(39) 


If  1/ X23  <  0,  then  (39)  immediately  reduces  to  l/x'23  < 
I/X13  <  1  /x<2  (by  (37),  we  are  considering  a  special  case  in 
which  1/ X12  >  0).  This  is  illustrated  in  Fig.  11  for  the  slightly 
different  situations  m23  <  m i2  and  m2 3  >  m \2.  If,  on  the 
other  hand,  I/X23  >  0,  then  (38)  and  (39)  together  imply  two 
possible  situations,  depending  on  whether  I/X23  <  1  /x\ 2  or 
1  / X23  /  1  /x  \  2-  These  possibilities  are  illustrated  in  Fig.  12. 

If  7313  — 7323  <  0  (ie,  73x3  <  7323.  or  U2 13  <  Ui |3),  we  have 
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(a)  (b) 

Fig.  15.  Example  ideal  observer  decision  rales  for  the  case  3313  —  7323  =  0 
(implying  l/b12  =  f  x  and  I  /  \  ,  2  =  ±00).  In  (a),  I/&13  <  0,  and  the 
“2-vs-3”  line  can  lie  anywhere  between  the  two  dashed  lines  shown  (the  region 
between  the  vertical  dashed  and  dotted  lines  is  excluded  because  I/&23  >  0); 
observations  in  the  unlabeled  region  to  above  this  line  will  be  decided  '‘tt2 and 
those  below  this  line  will  be  decided  "^3.”  In  (b),  1/(33  >  0,  and  the  “2-vs-3” 
line  can  lie  anywhere  in  the  unlabeled  region;  observations  above  this  line  will 
be  decided  "7r2,”  and  those  below  this  line  will  be  decided  “7r3.” 

values  of  the  undetermined  decision  boundary  parameter  being 
illustrated  in  that  figure).  Specifically 


Fig.  11(a) 

Figs.  1(a),  4(a),  5(a) 

Fig.  11(b) 

Fig.  4(b) 

Fig.  12(a) 

Figs.  1(a),  3(a),  5(a) 

Fig.  12(b) 

Figs.  1(b),  3(b),  5(a) 

Fig.  13(a) 

Figs.  3(a),  4(a),  5(b) 

Fig.  13(b) 

Fig.  4(b) 

Fig.  14(a) 

Fig.  2(a) 

Fig.  14(b) 

Fig.  2(b) 

Fig.  15(a) 

=> 

Figs.  3(a),  4(a),  5(b) 

Fig.  15(b) 

Figs.  2(a),  3(a),  4(b). 

That  is,  none  of  the  conditions  derived  in  this  appendix 
are  inconsistent  with  those  derived  in  Section  III  or  Ap¬ 
pendix  A.  More  importantly,  note  the  symmetry  between  the 
corresponding  equations  and  figures  in  Sections  III  and  this 
appendix,  if  one  “swaps”  the  labels  of  classes  iro  and  7T.3,  and 
additionally  replaces  rrijj  with  1  /Xi'j’’  Xij  with  l/m^y,  and 
bij  with  1  jbi'jt  (i'  =  1  if  i  =  1,  2  if  i  =  3,  and  3  if  %  =  2; 
similarly  for  j ). 
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Abstract 

We  analyze  recently  proposed  decision  rules  for  three-class  classification  from  the 
point  of  view  of  ideal  observer  decision  theory.  We  consider  three-class  decision 
rules  proposed  by  Scurfield,  by  Chan  et  al.,  and  by  Mossman.  Scurfield’s  decision 
rule  is  shown  to  be  a  special  case  of  the  three-class  ideal  observer  decision  rule  in 
three  different  situations.  Chan  et  al.  start  with  an  ideal  observer  model  and  specify 
its  decision-consequence  utility  structure  in  a  way  that  causes  two  of  the  decision 
lines  used  by  the  ideal  observer  to  overlap  and  the  third  line  to  become  undefined. 
Finally,  we  show  that,  for  a  particular  and  obvious  choice  of  ideal-observer-related 
decision  variables,  the  Mossman  decision  rule  cannot  be  a  special  case  of  the  ideal 
observer  decision  rule.  Despite  the  considerable  difficulties  presented  by  the  three- 
class  classification  task,  the  three-class  ideal  observer  provides  a  useful  framework 
for  analyzing  a  variety  of  three-class  decision  strategies. 

Key  words:  ROC  analysis,  three-class  classification,  ideal  observer  decision  rules 


1  Introduction 


We  are  attempting  to  develop  a  fully  automated  mass  lesion  classification 
scheme  for  computer-aided  diagnosis  (CAD)  in  mammography.  This  scheme 
will  combine  two  schemes  developed  at  the  University  of  Chicago:  one  for 
automatically  detecting  mass  lesions  in  mammograms  (Bick,  Giger,  Schmidt, 
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Nishikawa,  Wolverton,  and  Doi,  1995;  Yin,  Giger,  Doi,  Metz,  Vyborny,  and 

21  Schmidt,  1991;  Yin,  Giger,  Vyborny,  Doi,  and  Schmidt,  1993;  Yin,  Giger,  Doi, 

22  Vyborny,  and  Schmidt,  1994;  Kupinski,  2000),  and  one  for  classifying  known 

23  lesions  as  malignant  or  benign  (Huo,  Giger,  Vyborny,  Wolverton,  Schmidt,  and 

24  Doi,  1998;  Huo,  Giger,  and  Metz,  1999;  Huo,  Giger,  Vyborny,  Wolverton,  and 

25  Metz,  2000;  Huo,  Giger,  and  Vyborny,  2001;  Huo,  Giger,  Vyborny,  and  Metz, 

26  2002).  Combining  these  two  types  of  CAD  scheme  is  inherently  difficult,  be- 

27  cause  the  output  of  the  detection  scheme  will  necessarily  include  false-positive 

28  (FP)  computer  detections  in  addition  to  the  malignant  and  benign  lesions  to 

29  be  classified.  These  FP  computer  detections  correspond  to  objects  which  were 

30  by  design  not  included  in  the  training  sample  of  the  classification  scheme, 

31  because  they  are  not  members  of  the  data  population  (benign  and  malignant 

32  mass  breast  lesions)  for  which  the  classification  scheme  was  created,  ft  is  clear 

33  then  that  the  detection  scheme’s  output  cannot  be  used  unmodified  as  the 

34  input  to  the  classification  scheme. 

35  Our  approach  has  been  to  treat  this  problem  explicitly  as  a  three-class  classih- 

36  cation  task.  That  is,  the  outputs  of  the  detection  scheme  should  be  classified  as 

37  malignant  lesions,  benign  lesions,  and  non- lesions  (FP  computer  detections), 

38  and  the  classifier  to  be  estimated  is  the  ideal  observer  decision  rule  for  this 

39  task.  Such  an  approach  presents  considerable  difficulties  of  its  own.  On  the 

40  one  hand,  decision  rules,  in  particular  ideal  observer  decision  rules,  increase 

41  rapidly  in  complexity  with  the  number  of  classes  involved.  On  the  other  hand, 

42  a  fully  general  performance  evaluation  method,  such  as  a  three-class  extension 

43  of  receiver  operating  characteristic  (ROC)  analysis,  has  yet  to  be  developed. 

44  It  should  be  mentioned  that  the  simple  model  we  have  just  described  corre- 

45  sponds  in  the  two-class  classification  task  to  ROC  analysis  performed  “per 

46  detection;”  that  is,  each  “case”  being  classified  corresponds  to  a  small  region 

47  of  interest  (ROI)  in  the  image  containing  a  single  computer  detection.  Other 

48  formulations,  such  as  ROC  analysis  “per  image,”  ROC  analysis  “per  patient” 

49  (for  a  set  of  images,  such  as  the  four  mammographic  views  obtained  in  a 

50  typical  screening  setting),  or  free-response  ROC  (FROG)  (Bunch,  Hamilton, 

51  Sanderson,  and  Simmons,  1978;  Chakraborty,  1989,  2002)  analysis,  are  also 

52  possible,  but  their  extension  to  tasks  with  three  or  more  classes  is  beyond  the 

53  scope  of  the  present  work. 

54  The  explicit  form  of  the  decision  rule  used  by  the  ideal  observer  in  a  three- 

55  class  classification  task  has  been  known  for  some  time  (Van  Trees,  1968).  For 

56  the  reasons  just  stated,  however,  a  practical  and  general  method  for  estimat- 

57  ing  and  evaluating  observer  performance  has  proven  elusive.  In  particular, 

58  Scurheld  (1996)  defined  the  two-class  information-based  performance  metric 

59  D1:2  =  log  2  -  AUG  log  AUC  -  (1  -  AUG)  log(l  -  AUC)  (where  AUG  is  the 

60  area  under  the  two-class  ROC  curve),  and  extended  it  to  the  three-class  case 

61  for  two  different  decision  rules  (Scurheld,  1996,  1998).  Srinivasan  (1999)  inves- 

62  tigated  the  optimality  of  discrete,  multi-class  ROC  operating  points,  but  not 
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63  continuous  ROC  hypersurfaces,  under  a  cost  function  equivalent  to  the  Bayes 

64  risk.  Mossman  (1999)  evaluated  the  performance  of  a  three-class  classifier  with 

65  a  surface  formed  from  the  three  correct  classification  probabilities.  Hand  and 

66  Till  (2001)  proposed  the  average  of  the  areas  under  all  N(N  —  l)/2  between- 

67  class  ROC  curves  as  a  performance  metric  in  an  IV-class  classification  task. 

68  Obnchowski,  Applegate,  Goske,  Arheart,  Myers,  and  Morrison  (2001)  elicited 

69  readers’  estimates  of  the  set  of  probabilities  of  each  observation  belonging  to 

70  N  classes,  and  then  used  conventional  (two-class)  ROC  analysis  to  evaluate 

71  each  of  the  N(N  —  l)/2  differences  of  these  estimates  for  its  ability  to  distin- 

72  guish  between  the  relevant  pair  of  classes.  Ferri,  Hernandez- Or  alio,  and  Salido 

73  (2003)  proposed  a  variety  of  algorithms  for  calculating  the  hypervolume  un- 

74  der  the  convex  hull  obtained  from  a  set  of  discrete  ROC  operating  points;  a 

75  modified  version  of  the  Hand  and  Till  metric  averaging  the  N  areas  under  the 

76  ROC  surfaces  that  measure  the  observer’s  ability  to  distinguish  a  given  class 

77  from  the  remaining  iV  —  1 ;  and  a  graphical  “cobweb”  representation  of  the 

78  observer’s  misclassihcation  probabilities.  Lachiche  and  Flach  (2003)  proposed 

79  iterative  algorithms  for  finding  the  optimal  among  a  discrete  set  of  multi-class 
so  ROC  operating  points  based  on  either  percent  correct  or  Bayes  risk.  Nakas 
si  and  Yiannoutsos  (2004)  considered  an  observer  using  a  decision  rule  similar 

82  to  that  of  Scurheld  (1996),  and  evaluated  its  performance  statistically  by  ex- 

83  tending  methods  proposed  by  Dreiseitl,  Ohno-Machado,  and  Binder  (2000). 

84  Patel  and  Markey  (2005)  applied  a  variety  of  proposed  evaluation  metrics, 

85  including  the  Hand  and  Till  metric,  the  modified  Hand  and  Till  metric  of 
se  Ferri,  the  “cobweb”  graphical  measure  of  Ferri,  and  the  Mossman  ROC  sur- 
87  face,  to  radiologist  assessment  data  of  mammographic  images  from  patients 
as  who  subsequently  underwent  biopsy. 

89  The  works  cited  above  demonstrate  the  difficulty  in  developing  a  fully  general 

90  performance  metric  for  classification  tasks  with  more  than  two  classes.  Lacking 

91  such  a  performance  metric  in  turn  makes  the  development  of  observer  deci- 

92  sion  rules  for  such  tasks  difficult,  because  they  can  at  present  be  evaluated 

93  and  compared  only  from  a  theoretical  rather  than  an  empirical  perspective. 

94  Nevertheless,  observer  decision  rule  models  for  three-class  classification  tasks 

95  have  been  proposed  relatively  recently  by  several  groups  of  researchers.  In 

96  some  cases,  these  models  are  motivated  more  by  considerations  of  tractability 

97  than  of  complete  generality.  This  is  of  course  understandable  given  the  inhcr- 

98  ent  difficulties  of  three-class  classification;  however,  we  thought  it  might  be 

99  of  interest  to  analyze  a  number  of  recently  proposed  three-class  decision  rule 

100  models  within  an  ideal  observer  decision  rule  framework. 

101  In  the  next  section,  we  review  the  three-class  ideal  observer  decision  rule.  In 

102  the  following  three  sections,  we  review  recently  proposed  three-class  decision 

103  rule  models:  one  by  Scurheld  (1998),  one  by  Chan,  Sahiner,  Hadjiiski,  Petrick, 

104  and  Zhou  (2003),  and  one  by  Mossman  (1999).  In  each  case,  the  given  decision 

105  rule  is  analyzed  in  terms  of  the  ideal  observer  decision  rule;  where  necessary 
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106  or  expedient,  assumptions  are  made  about  the  observer’s  decision  variables  in 

107  order  to  facilitate  this  analysis.  We  emphasize  that  we  do  not  attempt  a  review 
loo  of  the  experimental  methods  or  detailed  analysis  of  proposed  performance 
loo  evaluation  metrics  in  the  works  discussed;  we  are  here  interested  only  in  the 
no  form  of  the  decision  rule  which  serves  as  the  starting  point  for  each  work,  and 
m  superficially  in  the  proposed  evaluation  metrics  inasmuch  as  they  are  related  to 
in  those  decision  rules.  (Because  of  the  lack  of  a  fully  general  performance  metric, 
in  or  figure  of  merit,  for  the  three-class  classification  task,  in  particular  apparent 
in  inconsistencies  which  are  obtained  from  a  straightforward  generalization  of 
ns  the  area  under  the  ROC  curve  (Edwards,  Metz,  and  Nishikawa,  2005),  we 
lie  do  not  attempt  any  validation  or  quantitative  comparison  of  the  proposed 
n7  performance  metrics.)  The  results  of  our  analyses  are  briefly  summarized  in 
ns  Sec.  6. 


ns  2  The  Three-Class  Ideal  Observer 


120  It  can  be  shown  (Van  Trees,  1968;  Edwards,  Metz,  and  Kupinski,  2004b) 

121  that  an  iV-class  ideal  observer  makes  decisions  regarding  statistically  variable 

122  observations  x  by  partitioning  a  likelihood  ratio  decision  variable  space,  where 

123  the  boundaries  of  the  partitions  are  given  by  hyperplanes: 

124 


125 


126 


127 

128 


N—l 


£  W \k 


N- 1 

£  M \t 


k= 1 


decide  d  =  7r,;  iff 


Uj\k)P(t  —  nk)LRk  >  ( Uj | n  ~  Ui\N)P{t 

and 


Uj\k)P{t  =  7Tfc)LRfe  >  (Uj\N  -  Ui\N)P(t 


TTaO 

{j  <  i} 

(1) 

7Cn) 

{j  >  i}- 

(2) 

129  Here  Utu  is  the  utility  of  deciding  an  observation  is  from  class  7 q  given  that 

130  it  is  actually  from  class  ttj,  and  the  N—l  likelihood  ratios  are  defined  as 


131 


LRfc 


pg(z\t  =  7 Tfc) 
Px(x  |t  =  nN) 


(3) 


132  for  k  <  N.  We  also  define  the  actual  class  (the  “truth”)  to  which  an  obser- 

133  vation  belongs  as  t,  and  the  class  to  which  it  is  assigned  (the  “decision”)  as 

134  d,  where  t  and  d  can  take  on  any  of  the  values  7Ti, . . . ,  7 r*, . . . ,  tin,  the  labels 

135  of  the  various  classes.  (We  use  boldface  type  to  denote  statistically  variable 

136  quantities.)  For  simplicity,  we  will  usually  write  nk  to  denote  the  event  t  =  7Tk, 

137  as  in  the  a  priori  probability  P{iik). 
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138  The  partitioning  of  the  decision  variable  space  is  determined  by  the  parameters 

139  'lijk  =  (Ui\k  f^j|fc)T>(7T/c) ,  (4) 

wo  with  i,  j,  and  k  varying  from  1  to  N,  and  j  ^  i.  Note  that  these  parameters 
wi  are  not  independent,  however,  because 

142  'Yijk  Ikjk  '~ikik‘  (5) 

143  We  can  impose  the  reasonable  condition  that  the  utility  for  correctly  clas- 

144  sifying  an  observation  from  a  given  class  should  be  greater  than  any  utility 
ms  for  incorrectly  classifying  an  observation  from  the  same  class,  i.  e.,  U,u  > 
wo  Uj\i  {i  7^  j}.  This  gives,  for  j  ^  i, 

I42  7 iji  >  0,  (6) 

us  leaving  N(N  —  1)  parameters  (the  rest  are  derivable  from  (5)). 

wo  Finally,  note  that  the  hyperplanes  represented  by  (1)  and  (2)  are  unchanged  if 

150  we  multiply  all  of  these  relations  by  a  single  scalar,  such  as  1  7 iji)-  This 

151  leaves  us  with  N2  —  N  —  1  degrees  of  freedom,  as  expected,  and  effectively 

152  imposes  the  condition 

153  270*  =  1-  (7) 

i¥=j 


154  The  behavior  of  a  three-class  ideal  observer  is  completely  determined  by  the 

155  three  decision  boundary  lines 

156 


157 

7121LR1 

—  7212LR2  =  7313  —  7323 

(8) 

158 

7131LR1  +  (7232 

—  7212  )LR2  =  7313 

(9) 

159 

(7131  —  7l2l)LRl 

+  7232  LR2  =  7323, 

(10) 

160  which  we  call,  respectively,  the  “l-us.-2”  line,  the  “l-t>s.-3”  line,  and  the  “2- 

161  vs- 3”  line.  Note  that  if  any  two  of  these  lines  intersect,  the  third  line  must 

162  also  share  this  intersection  point.  We  also  emphasize  the  simple  interpretation, 

163  from  (4),  of  each  of  the  7^  parameters  appearing  in  these  decision  boundary 

164  line  equations  as  the  difference  in  utilities  between  a  “correct”  and  one  partic¬ 
les  ular  “incorrect”  decision  (scaled  by  the  a  priori  probability  of  the  true  class  in 
lee  question);  and  of  each  difference  in  the  7^  parameters  as  a  difference  in  util- 
167  ities  between  two  possible  “incorrect”  decisions  (again  scaled  by  the  a  priori 
lea  probability  of  the  true  class  in  question). 
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Fig.  1.  Example  three-class  ideal  observer  decision  rule,  given  the  values  of  the 
decision  parameters  7121  =  7212  =  3/14  and  7131  =  7313  =  7232  =  7323  =  1/7.  Note 
that  7 iji  =  ( Ui\i  -  =  77). 

i69  An  example  ideal  observer  decision  rule  for  particular  values  of  the  utilities 
no  Ui\j,  and  hence  of  the  parameters  7^,  is  shown  in  Fig.  1.  Here  we  have  chosen 

171  7121  =  7212  =  3/14  and  7131  =  7313  =  7232  =  7323  =  1/7,  yielding  the  decision 

172  boundary  lines 

173 


174 

7lr- 

-7Lr2=o 

{“1  -vs.-2v} 

(u) 

175 

(lRi- 

1LRi 

{“l-vs.-3”} 

(12) 

176 

-nLRi 

+  )lr2  =  i 

{“2-vs.-3”}. 

(13) 

177  These  simplify  to  the  equations  LR2  =  LRj ,  LR2  =  2LR,!  —  2,  and  LR2  = 
ns  LRi/2  +  1,  respectively. 


179  3  The  Scurfield  Decision  Rule 

iso  Scurfield  investigated  a  decision  rule  applied  to  two-dimensional  statistically 

181  variable  data  (y  =  (yi,y2))  drawn  from  three  classes  (Scurfield,  1998).  The 

182  application  domain  was  human  observer  performance  modeling  for  acoustical 

183  psychophysics  experiments.  (In  prior  work,  Scurfield  investigated  a  decision 

184  rule  for  three-class  classification  of  univariate  data  (Scurfield,  1996).  We  will 
las  not  review  that  prior  work  here,  because  at  present  we  are  interested  in  relat- 
186  ing  given  observer  models  to  the  general  three-class  ideal  observer  model  for 
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Fig.  2.  Decision  rule  investigated  by  Scurfield,  for  the  decision  parameters  71  and 
72- 

187  multivariate  observational  data,  which  —  except  in  degenerate  cases  —  will 

188  yield  two-dimensional  decision  variable  data  by  (3).)  In  Scurfield’s  work,  no 

189  assumptions  are  made  about  the  decision  variables  y ,  and  y2;  in  particular, 

190  these  decision  variables  are  not  assumed  to  be  related  in  any  way  to  an  ideal 

191  observer  model.  This  is  entirely  appropriate  given  the  nature  of  the  problem 

192  domain  Scurfield  investigated  —  he.,  human  observer  performance  modeling. 

193  It  can  readily  be  shown,  however,  that  if  one  chooses  to  make  such  assump- 

194  tions,  special  cases  of  the  Scurfield  model  are  in  fact  special  cases  of  an  ideal 

195  observer  decision  rule. 


wo  The  Scurfield  decision  rule  is  dependent  on  two  decision  parameters,  which  we 

197  will  call  7!  and  72.  The  decision  rule  can  be  written  as 

198 


199 

decide  d  =  717 

33 

1 

to 

IV 

1 

to 

and 

2/i  >  7i5 

(14) 

200 

decide  d  =  7t2 

iff  2/1  -  2/2  <  71  -  72 

and 

2/2  >  72; 

(15) 

201 

decide  d  =  tt3 

iff  2/1  <  71 

and 

2/2  <  72- 

(16) 

202 

This  decision  rule  is 

illustrated  in  Fig.  2. 

203 

204 

From  these  relations, 

one  can  define  the  decision  boundary  lines 

205 

CM 

1 

CM 

S* 

1 

Ss 

{“1-DS.-2”} 

(U) 

206 

2/i  —  7i 

{“l-us.-3”} 

(18) 

207 

2/2  =72 

{“2-vs.-3”}. 

(19) 

208 

If  we  choose  y1  =  LRi(x)  and  y2  =  LR2( 

'x)  for 

some  set  of  observational 

209 

data  x,  we  have 

7 


71  LRi 


Fig.  3.  A  special  case  of  the  ideal  observer  decision  rule  with 
7121  =  7212  =  7131  =  7232  =  1/(71  +  72  +  4),  7313  =  71/(71  +  72  +  4),  and 
7323  =  72/(71  +72  +  4).  The  parameters  71  and  72  are  positive  but  otherwise 
arbitrary;  this  decision  rule  is  a  special  case  of  the  Scurfield  decision  rule  with 
Yi  =  LRi(x)  and  y2  =  LR2(x). 


210 


211 

1  1 
— LRi  —  — LR2  = 

1 

-7 

to 

{  “l-us.-2”  } 

(20) 

O 

O 

7o 

212 

— LRi  = 

7i 

{  “l-us.-3”  } 

(21) 

7o 

7o 

213 

-lr2  = 

72 

{  “2-us.-3”  }, 

(22) 

7o 

7o 

214  where  70  =  71+72  +  4  (to  impose  consistency  with  (7)).  Note  the  similarity  in 

215  form  between  these  equations  and  (8)-(10).  If  we  require  71  and  72  to  be  posi- 

216  tive,  the  correspondence  is  exact,  and  this  special  case  of  (8)— (10)  is  illustrated 

217  in  Fig.  3.  (In  fact,  the  intersection  of  the  ideal  observer  decision  boundary  lines 
21B  can  lie  in  any  quadrant.  However,  given  a  set  of  decision  boundary  lines  with 

219  slopes  as  depicted  in  Fig.  2,  the  occurrence  of  the  intersection  point  in  any 

220  quadrant  other  than  the  first  would  result  in  an  ideal  observer  operating  point 

221  for  which  no  observations  were  assigned  to  class  713.  This  “degenerate”  case 

222  will  not  be  considered  here.)  As  an  aside,  it  is  of  some  interest  to  note  that 

223  if  71  =  72  =  1,  the  decision  boundary  line  equations  reduce  to  LRi  =  LR2, 

224  yielding  p(x\tti)  =  p(+|7r2);  LRi  =  1,  yielding  p(x\ni)  =  p(df| 7r3) ;  and  LR2  =  1, 

225  yielding  p(d?| 7t2)  =  p(x\n3).  That  is,  the  decision  boundary  lines  correspond, 

226  in  the  observational  data  space,  to  the  loci  of  intersection  of  the  observational 

227  data  probability  density  functions.  (This  is  illustrated  in  Figs.  2B  and  2C  of 
Scurfield  (1998).) 
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228 


229  A  second  correspondence  between  Scurfield’s  decision  rule  and  the  ideal  ob- 

230  server  decision  rule  can  be  obtained  by  taking  y1  =  log^R^x))  and  y2  = 

231  log(LR2(x)),  with  71  and  72  now  unrestricted.  Substituting  this  dehnition  in 

232  (IT)— (19) ,  we  obtain 

233 

234  logtXRi)  -  log(LR2)  =  7i  -  72  { ul-vs.-2” }  (23) 

235  log(LR1)=71  {“1-US.-3”}  (24) 

236  log(LR2)  =72  { “2-VS.-3” }.  (25) 


237  Taking  exponentials  on  each  side  of  these  equations  then  gives 

238 


239 

LRi  71_ 

- —  en 

lr2 

'72  {“l-us.-2”} 

(26) 

240 

LRi  =  e71 

{“1-US.-3”} 

(27) 

241 

LR2  =  e72 

{“2-vs.-3”}; 

(28) 

242 

we  can  then  rearrange  terms  and  divide  the  equations  by 

a  constant  factor 

243 

to  obtain 

244 

245 

e-71 

LRi 

7o 

p-72 

LR2  =  0  (“1-US.-2”} 

7o 

(29) 

246 

e  LR1=  (“1-US.-3”} 

7o  7o 

(30) 

247 

e-72  1 

LR2=  {“2-vs.-3”}, 

7o  7o 

(31) 

248  where  70  =  2(e-71  +  e-72  +  1).  By  inspection,  this  is  again  a  special  case 

249  of  (8) — (10) ,  which  is  illustrated  in  Fig.  4.  (This  special  case  is  currently  the 

250  subject  of  independent  analysis  by  He,  Metz,  Tsui,  Links,  and  Frey  (2006).) 

251  As  an  aside,  we  note  that  if  71  =  72  =  0,  the  resulting  decision  boundary  lines 

252  again  correspond,  in  the  observational  data  space,  to  the  loci  of  intersection 

253  of  the  observational  data  probability  density  functions,  as  was  pointed  out  in 

254  the  text  following  (20)-(22). 

255  Finally,  if  we  take  y1  =  P(7Ti|x)  and  y2  =  P(7r2|x),  and  require  0  <  71  <  1 

256  and  0  <  72  <  1,  we  obtain 

257 


258 

P(7Ti|f)  -  P(7T2|£) 

=  71-72 

{“l-us.-2”} 

(32) 

259 

P(vTi|f) 

=  7l 

{“1-U5.-3”} 

(33) 

260 

P(7T2|f) 

=  72 

{“2-vs.-3”}, 

(34) 

26i  as  illustrated  in  Fig.  5. 
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Fig.  4.  A  special  case  of  the  ideal  observer  decision  rule  with  7121  =  7131  =  e  7l/7o> 
7212  =  7232  =  e~71  /70)  7313  =  7323  =  I/70,  and  70  =  2(e“71  +  e-72  +  1).  The 
parameters  71  and  72  are  arbitrary;  this  decision  rule  is  a  special  case  of  the  Scurfield 
decision  rule  with  y1  =  log(LRi(x))  and  y2  =  log(LR2(x)). 


Fig.  5.  A  special  case  of  the  Scurfield  decision  rule  with  y1  =  P(tt\\x)  and 
y2  =  P(k2\x). 

262  Note  that  (3)  can  be  written  as 


LR;  — 

P(TTi\x)  = 

P{jii\x)  = 


P(TTj\x)p(x)/P(TTi) 
P(x\n  3) 
LRjP(7Ti) 


{i  :  1  <  i  <  2} 


p(x)/p(x\ir3) 


LRi[P(7ri)/P(7r3)] 


1  +  LR1[P(7r1)/P(7r3)]  +  LR2[P(7r2)/P(7r3)] ' 

267  This  allows  us  to  rewrite  (32)-(34)  as 


(35) 
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268 


269 


270 


1  ~  (7i  ~  72)  Pfa) 
7o  P(n  3) 

l-7l  P( 7Ti) 
7o  PM 
72  -P(tti) 
7o  ^(ps) 


LR,  - 


LR, 


LRi 


1  +  (7j_- _7 2)  P(7t2)  _  7i  -  72 

7o  P(vr3)  2  7o 
_  7i^(^2)Tp  _  7i 

7o  Pfa)  2  70 

,  1  -72^(7T2)Tr)  72 

H - 757 — r^n-2  —  — , 

7o  PM  7o 


(36) 

(37) 

(38) 


271  respectively,  where  y0  =  (2-27i+72)P(7Ti)/P(7r3)  +  (2+7i-272)P(7r2)/P(7r3)  + 

272  7!  +72.  This  is  again  a  special  case  of  (8)— (10) ,  as  the  quantities  1  —  (71  —  72), 

273  1 T  (71.  —  72)j  1 — 7i>  and  1  —  72  are  all  positive  given  0  <  71  <  1  and  0  <  72  <  1. 


274  Scurheld  (1998)  points  out  that  the  observer  which  maximizes  Pc,  the  “percent 

275  correct”  or  probability  of  a  correct  response,  is  a  special  case  of  the  ideal 

276  observer  (i.  e.,  a  single  operating  point  achievable  by  the  ideal  observer  for 

277  the  given  task).  This  observer  follows  the  Scurheld  decision  rule  model  with 

278  y ,  =  log(LRx(x))  and  y2  =  log(LR2(x)),  and  decision  parameters  given  by 

279  e71  =  P(7t3)/P(7Ti)  and  e72  =  P(7t3)/P(7t2).  It  is  interesting  to  note  that  the 

280  Scurheld  decision  rule  model  can  in  fact  be  used  to  describe  ideal  observer 

281  performance  for  an  even  wider  class  of  operating  points,  as  shown  in  this 

282  section. 


283  To  evaluate  the  performance  of  an  observer  using  the  decision  rule  in  (17)- 

284  (19),  Scurheld  plots  a  set  of  six  surfaces  in  three-dimensional  ROC  spaces, 

285  giving  P(d  =  7T2 |t  =  a(7r2))  as  a  function  of  P(d  =  717  |t  =  a (717))  and 

286  P(d  =  7t3 1 1  =  a(7r3)).  Here  a  is  one  of  the  six  possible  permutations  of 

287  three  symbols.  Scurheld  gives  a  probabilistic  interpretation  for  this  evalu- 

288  ation  methodology:  the  volume  under  each  surface  is  the  probability  of  a 

289  particular  outcome  in  a  three-alternative  forced  choice  experiment,  and  thus 

290  the  six  volumes  must  sum  to  one.  This  constraint  means  that  at  most  hve 

291  of  the  surfaces  are  independent.  However,  given  the  number  of  conditional 

292  probabilities  P(d  =  77 1 1  =  77)  involved,  one  can  show  that  only  four  such 

293  surfaces  are  required  to  completely  specify  the  tradeoffs  among  the  observer’s 

294  conditional  classification  probabilities.  Without  loss  of  generality,  we  consider 

295  plotting  each  of  P(d  =  7r2|t  =  717),  P(d  =  7r2|t  =  7t3),  P(d  =  7r3|t  =  717),  and 

296  P(d  =  7t3 1 1  =  7t2)  as  functions  of  P(d  =  7rx  1 1  =  7t2)  and  P(d  =  7Ti_ 1 1  =  7T3). 

297  (As  with  Scurheld’s  plots,  these  are  well  defined  because  Scurheld’s  decision 

298  rule  has  two  degrees  of  freedom,  namely  the  parameters  71  and  72.) 

299  Now  consider  one  of  Scurheld’s  plots,  for  example  that  which  gives  P(d  = 

300  7t2 1 1  =  7t2)  as  a  function  of  P(d  =  7iy|t  =  717)  and  P(d  =  7r3|t  =  7t3).  Because 

301  these  are  conditional  probabilities,  we  have 

302 

P(d  =  717 1 1  =  717)  =  1  —  P(d  =  7t2  1 1  =  717)  —  P(d  =  7r3 1 1  =  717)  (39) 

P(d  =  7T2  |t  =  7T2)  =  1  -  P(d  =  7Tx  |t  =  7 r2)  -  P(d  =  7T3  |t  =  7T2)  (40) 


304 
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305 


(41) 


P{ d  =  7T3|t  =  7T3)  =  1  -  P( d  =  7T!  |t  =  7T3)  -  P( d  =  7T2|t  =  7T3). 

306  Each  of  the  conditional  probabilities  on  the  right  hand  side  of  these  equations 

307  can  be  written  as  functions  of  P( d  =  7Ti |t  =  7t2)  and  P( d  =  7Ti  |t  =  7r3)  in  our 

308  formulation;  thus  the  surface  given  in  this  plot  is  determined  parametrically  by 

309  the  set  of  four  surfaces  we  have  given.  Similar  remarks  hold  for  the  other  Eve 

310  surfaces  used  by  Scurheld.  In  general,  for  an  TV-class  classification  task  using  a 
3n  Scurheld-type  decision  rule  with  TV  —  1  degrees  of  freedom  (the  generalization 

312  to  TV  classes  of  (17)— (19)) ,  one  can  show  that  a  set  of  (TV  —  l)2  hypersurfaces 

313  with  TV  —  1  degrees  of  freedom  in  TV- dimensional  ROC  spaces  is  necessary  to 

314  fully  characterize  the  observer’s  performance,  although  the  interpretation  of 

315  those  hypersurfaces  is  not  necessarily  as  straightforward  or  elegant  as  that 

316  provided  for  the  TV!  —  1  hypersurfaces  used  by  Scurheld. 


317  4  The  Chan  Decision  Rule 

318  Chan  et  al.  are  investigating  three-class  classifiers  for  computer-aided  diag- 

319  nosis  (Chan  et  ah,  2003).  Their  work  is  motivated  by  reasoning  similar  in 

320  principle  to  that  which  we  independently  arrived  at  when  we  began  to  con- 

321  sider  this  problem.  In  particular,  they  consider  a  clinical  situation  in  which 

322  observations  must  be  classified  as  malignant,  benign,  or  normal.  The  goal  of 

323  their  work  is  not  just  the  psychophysical  measurement  of  the  performance  of 

324  an  existing  (e.  g.,  human)  observer,  but  the  optimization  of  the  performance 

325  of  a  system  (containing  components  with  parameters  subject  to  experimen- 

326  tal  control,  e.  g.  an  artificial  neural  network)  to  aid  a  radiologist  or  clinician. 

327  Thus  they  are  free,  at  least  in  theory,  to  start  explicitly  from  an  ideal  observer 

328  model  in  constructing  their  decision  rule. 

329  In  order  to  reduce  the  complexity  of  the  ideal  observer  decision  rule  to  man- 

330  ageable  proportions,  Chan  et  al.  impose  restrictions  on  the  utilities  used  by 

331  their  observer.  In  their  formulation,  the  class  we  are  labeling  ni  is  the  be- 

332  nign  class;  7t2,  the  normal  class;  and  the  malignant  class  is  7T3.  They  further 

333  assume  that  the  possible  values  of  any  utility  Utu  are  restricted  to  the  inter- 

334  val  [0,1].  They  then  set  C/i|i  =  U2 12  =  t/3|3  =  1  (he.,  correctly  identifying 

335  any  case  has  maximal  utility).  Furthermore,  they  require  U2\  1  =  U2\2  =  1 

336  and  C/i|3  =  C2|3  =  0  (i.  e.,  misidentifying  a  benign  case  as  normal,  or  vice 

337  versa,  has  no  significant  cost  reducing  the  utility  of  such  a  decision  from  the 

338  maximum,  but  misclassifying  an  actually  malignant  case  as  benign  or  normal 

339  has  the  minimum  possible  utility).  Finally,  f/3 n  and  f/3 \2  are  assumed  to  have 

340  arbitrary  values  on  the  open  interval  (0, 1)  (i.  e.,  misclassifying  an  actually 

341  non-malignant  case  as  malignant  will  have  some  cost  reducing  the  utility  of 

342  such  a  decision  from  the  maximum,  but  such  a  misclassihcation  is  in  some 

343  sense  “better”  than  missing  an  actual  malignancy).  It  is  important  to  note 
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344  that  these  assumptions  are  arguably  relevant  to  a  reasonable  model  of  a  clin- 

345  ical  situation,  and  are  thus  of  interest  beyond  their  superficial  advantage  in 

346  reducing  the  degrees  of  freedom  involved  in  the  observer’s  decision  rule.  We 

347  will,  however,  only  consider  the  latter  issue  in  the  remainder  of  this  section. 


348  Substituting  the  values  of  the  utilities  given  above  into  (4),  we  obtain  decision 

349  boundary  lines  of  the  form 

350 


351 

352 

353 


0  LR,  +  0  LR2  =  0 

— — — — — —  LR-!  +  (I—^^MlR.,  =  P(7F3) 

7o  7o  7o 

(l-^iQPK)^  +  (l-f/3|2)P(vr2)LR  =  P(tt3) 

7o  7o  7o 


{“1-W.-2”}  (42) 
{“l-vs.-3”}  (43) 

{“2-vs.-3”}  (44) 


354  where  y0  =  1  +  P(tt 3)  —  R3|iP(7r1)  —  P3|2P(vr2).  Note  that,  as  Chan  et  al.  point 

355  out,  the  “1-VS.-2”  line  is  in  fact  undefined  for  this  choice  of  utilities,  while  the 

356  “1-US.-3”  and  “2-us.-3”  lines  are  identical.  This  is  a  general  consequence  of 

357  (8)-(10);  if  any  two  of  these  equations  yield  identical  lines,  the  third  line  must 

358  be  undefined.  (Note  that,  strictly  speaking,  the  utility  structure  employed 

359  by  Chan  et  al.  is  excluded  from  our  formulation  by  the  requirement  stated 

360  in  (6).  However,  this  issue  —  he.,  whether  the  ideal  observer’s  performance 

361  should  be  considered  to  include  such  limiting  cases  —  is  largely  a  definitional, 

362  rather  than  a  fundamental,  issue,  because  (6)  could  just  as  readily  have  been 

363  formulated  as  a  non- negativity  constraint,  rather  than  a  strict  inequality  as 

364  we  have  chosen.) 


365  The  decision  rule  considered  by  Chan  et  al.  is  illustrated  in  Fig.  6.  It  can  be 

366  argued  that,  in  a  sense,  the  output  of  this  classifier  belongs  to  only  two  classes, 

367  malignant  and  non-malignant;  in  particular,  because  (42)  is  undefined,  this 

368  observer  will  never  unequivocally  decide  d  =  7Ti  (benign)  or  d  =  7T2  (normal). 

369  In  fact,  if  P3|i  =  P3|2,  the  observer’s  performance  is  identical  with  that  of  a 

370  two-class  ideal  observer  which  distinguishes  between  the  malignant  and  non- 

371  malignant  (benign  plus  normal)  classes.  However,  in  the  more  general  case  in 

372  which  P3|i  ^  U3 12,  the  observer  considered  by  Chan  et  al.  is  able  to  achieve 

373  ROC  operating  points  not  accessible  by  the  two-class  ideal  observer.  (That 

374  is,  the  three-class  ideal  observer  can  achieve  points  below  the  two-class  ideal 

375  observer’s  ROC  curve  in  a  two-class  ROC  space,  or,  equivalently,  points  off 

376  the  curve  representing  the  two-class  ideal  observer’s  performance  plotted  in  a 

377  three-class  ROC  space.)  Intuitively,  their  observer  makes  decisions  based  on 

378  the  three  distribution  functions  of  the  observational  data,  even  though  the 

379  observer’s  output  consists  of  only  two  possible  responses. 

380  Chan  et  al.  evaluate  the  performance  of  their  observer  by  plotting  P(d  = 

381  7t3 1 1  =  7t3)  as  a  function  of  P(d  =  7r3|t  =  7Ti)  and  P(d  =  7r3 1 1  =  7t2).  Note  that 

382  this  single  two-dimensional  surface  is  sufficient  to  completely  characterize  the 
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Fig.  6.  The  decision  rule  investigated  by  Chan  et  al.,  which  is  a  spe¬ 
cial  case  of  the  ideal  observer  decision  rule  with  7121  =  7212  =  0, 
7131  =  (1  -  t/3|l)jP(7rl)/T0,  7232  =  (1  -  £h3|2)P(7r2)/7o,  and  7313  =  7323  =  P^/To! 
here  70  =  1  +  P{tt 3)  —  C/3|1P(7ri)  —  U^Pi^)-  Observations  in  the  unlabeled  re¬ 
gion  are  decided  “not  ^3”,  i.  e.,  either  “711”  or  “712”.  The  intercepts  71  and  72  are 
P(tj)/ [(1  -  br3|1)P(7Ti)]  and  P(tt3)/[(1  -  U3\2)P(tt2)\,  respectively. 


383  tradeoffs  among  the  conditional  classification  probabilities  of  their  observer. 

384  This  is  because,  as  just  stated,  the  observer’s  output  consists  of  only  two 

385  possible  responses,  and  thus  we  have  only  six  classification  probabilities  P( d  = 

386  7Tj |t  =  7 Tj)  rather  than  the  nine  expected  in  a  three-class  classification  task. 

387  These  six  conditional  probabilities  are  still  constrained  by  three  equations, 

388  however: 

389 


390  P(d  =  7T3  |t  =  7Ti)  +  P(d  =  7T3  |t 

391  P(d  =  7T3|t  =  7 r2)  +  P(d  =  7T3 1 1 

392  P(d  =  7T3|t  =  7T3)  +  P(d  =  7T3  |t 


TTi)  =  1 

(45) 

7T2)  =  1 

(46) 

p?)  =  1, 

(47) 

393  where  the  expression  d  =  7T3  indicates  that  the  observer  decides  that  the 

394  observation  does  not  belong  to  class  7t3.  These  constraint  equations  allow  us 

395  to  eliminate  three  of  the  six  conditional  probabilities,  leaving  a  single  ROC 

396  surface  with  two  degrees  of  freedom  in  a  three-dimensional  ROC  space. 
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397 


5  The  Mossman  Decision  Rule 


398  Mossman  investigates  (Mossman,  1999)  a  decision  rule  applied  to  a  set  of  three 

399  decision  variables  yx,  y2,  and  y3,  subject  to  the  constraint 

«o  y1  +  y2  +  y3  =  !»  (48) 

401  as  well  asO<y,j<l  { 1  <  z  <  3} .  This  is  consistent  with  the  constraint 

402  on  the  a  posteriori  class  probabilities,  P(7Ti|x)  +  P(7r2|x)  +  P(7t3|x)  =  1; 

403  these  quantities  are  known  to  be  directly  related  to  the  likelihood  ratio  ideal 

404  observer  decision  variables  (Kupinski,  Edwards,  Giger,  and  Metz,  2001;  Ed- 

405  wards,  Lan,  Metz,  Giger,  and  Nishikawa,  2004a).  Mossman  does  not  explicitly 

406  require,  however,  that  the  decision  variables  in  (48)  be  the  a  posteriori  class 

407  probabilities  (e.  g.,  they  may  be  noisy  estimates  of  these  quantities). 

408  The  decision  rule  considered  by  Mossman,  which  depends  on  two  decision 

409  parameters  71  and  72,  is 

410 


411 

decide 

d 

=  Al 

iff 

V2 

-  y\<  72 

and 

P 

VI 

CO 

S'. 

(49) 

412 

decide 

d 

=  vr2 

iff 

V2 

-  y\  >  72 

and 

1/3  <  71; 

(50) 

413 

decide 

d 

=  vr3 

iff 

2/3  >  71  ■ 

(51) 

414  where  0  <  71  <  1  and  —  1  <  72  <  1.  From  these  relations,  and  given  the 

415  relation  r/3  =  1  —  y\  —  y2  from  (48),  one  can  define  the  decision  boundary  lines 

416 


417 

y\-V2  =  -72 

{“1-US.-2”} 

(52) 

418 

yi  +  V2  =  1  -  71 

{“1-US.-3”} 

(53) 

419 

J/i  +  J/2  =  1  -  7l 

{“2-vs.-3”}. 

(54) 

420  This  decision  rule  is  illustrated  in  Fig.  7.  Note  that,  similar  to  the  Chan  et  al. 

421  decision  rule,  the  “l-t>s.-3”  and  u2-vs.-3n  decision  boundary  lines  are  identical. 


422  We  now  consider  a  special  case  of  the  Mossman  decision  rule  in  which  y:  = 

423  P (717  |x) ,  y2  =  P(7T2 |x) ,  and  y3  =  P(7t3|x)  for  some  observational  data  vector 

424  x.  As  in  Sec.  3,  we  make  the  substitution  in  (35);  this  allows  us  to  rewrite 


(52)-(54)  as 

(i +72)  p(„3) LRi 

(1  72)P(v t3)LR2_  72 

{“1-DS.-2”} 

(55) 

PW,B 

71  p(*  ,)LRl 

+  71  p,  |  LR2  =  1  71 

P(n  3) 

{“1-US.-3”} 

(56) 
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Fig.  7.  Decision  rule  investigated  by  Mossman,  for  the  decision  parameters  71  and 
72,  shown  in  the  a  posteriori  class  probability  space. 


Fig.  8.  Decision  rule  investigated  by  Mossman,  for  the  decision  parameters  71  and 
72,  shown  in  likelihood  ratio  space. 


429 


7i 


P(’ rs) 


LRi 


+  7i 


PM 

PM 


lr2 


l-7i  { “2-US.-3” }, 


(57) 


430  This  version  of  the  decision  rule  is  illustrated  in  Fig.  8. 


431  Although  the  Mossman  decision  rule  for  this  choice  of  decision  variables  ap- 

432  pears  similar  in  form  to  the  ideal  observer  decision  rule,  recall  from  Sec.  4 

433  that  if  two  of  the  decision  boundary  line  equations  are  identical,  the  third 
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434  must  yield  a  line  identical  to  the  first  two  or  be  undefined.  Another  way  to  see 

435  this  is  to  note  that  the  coefficients  of  (10)  are  differences  of  the  corresponding 

436  coefficients  of  (8)  and  (9).  If  the  coefficients  of  (9)  and  (10)  are  identical,  it 

437  must  be  the  case  that  the  coefficients  of  (8)  are  all  zero.  For  the  Mossman  deci- 

438  sion  rule,  this  would  require  1  +  y2  =  0,  1  —  y2  =  0,  and  y2  =  0  simultaneously, 

439  which  is  clearly  impossible. 

440  It  follows  that,  for  this  particular  choice  of  decision  variables  (related  in  a 

441  straightforward  way  to  the  ideal  observer’s  decision  variables),  the  decision 

442  rule  considered  by  Mossman  cannot  represent  possible  ideal  observer  perfor- 

443  mance  for  any  choice  of  the  utilities  U,u  in  (1)  and  (2).  (One  can  construct 

444  probability  density  functions  such  that  the  Mossman  observer’s  behavior  for 

445  a  particular  choice  of  decision  criteria  (71  and  y2  in  (49)-(51))  corresponds 

446  to  ideal  observer  behavior  at  a  particular  operating  point.  However,  we  do 

447  not  at  present  have  any  reason  to  believe  that  this  result  can  be  generalized 

448  to  arbitrary  probability  density  functions  or  to  arbitrary  choices  of  decision 

449  criteria  for  a  given  choice  of  probability  density  functions.) 

450  Mossman  proposed  that  the  ROC  surface  obtained  by  plotting  P(d  =  7T3|t  = 

451  773)  as  a  function  of  P(d  =  717 |t  =  717)  and  P(d  =  7r2|t  =  7t2)  be  used  to 

452  evaluate  the  performance  of  the  observer.  Although  this  surface  is  clearly  well- 

453  defined  (the  Mossman  decision  rule  has  two  degrees  of  freedom,  namely  the 

454  parameters  71  and  72),  it  follows  from  the  discussion  at  the  end  of  Sec.  3  that 

455  four  such  surfaces  in  three-dimensional  ROC  spaces  are  needed  to  completely 

456  characterize  the  tradeoffs  among  the  observer’s  conditional  classification  prob- 

457  abilities. 


458  6  Discussion  and  Conclusions 

459  We  examined  three  decision  rules  proposed  recently  for  three-class  classih- 

460  cation  tasks  by  different  researchers.  The  basis  for  our  evaluation  was  ideal 

461  observer  decision  theory,  primarily  because  our  own  interest  in  the  three-class 

462  classification  task  is  its  possible  application  to  CAD.  A  major  goal  in  the 

463  development  of  a  computerized  scheme  for  CAD  is  the  optimization  of  the 

464  performance  of  that  scheme,  in  order  to  provide  the  maximum  benefit  to  clin- 

465  icians  and  thus  to  their  patients.  It  should  thus  be  kept  clearly  in  mind  that 

466  the  ideal  observer  framework  may  not  be  as  relevant,  for  example,  to  work 

467  which  is  motivated  by  purely  psychophysical  considerations  (Scurheld,  1996, 

468  1998;  Mossman,  1999)  —  he.,  where  the  goal  is  to  estimate  of  the  properties 

469  of  an  existing  observer. 

470  That  being  said,  the  three-class  classification  task  is  difficult  enough  that  it  is 

471  perhaps  worth  making  any  attempt  to  analyze,  from  a  single  point  of  view,  the 
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472  work  of  the  relatively  few  researchers  investigating  this  problem,  even  in  cases 

473  where  that  point  of  view  is  not  necessarily  relevant  to  the  underlying  motiva- 

474  tions  for  that  work.  We  feel  the  insights  we  have  gained  from  the  analysis  of 

475  various  decision  rules  presented  here  should  provide  at  least  some  justification 

476  for  that  claim. 

477  In  particular,  Scurfield  points  out  (Scurfield,  1998)  that  his  proposed  decision 

478  rule  is  in  fact  an  ideal  observer  decision  rule  for  a  single  ideal  observer  operat- 

479  ing  point,  namely  the  observer  which  maximizes  the  probability  of  any  correct 

480  response  (or  “percent  correct”  or  Pc ).  We  were  able  to  show  that,  under  var- 

481  ious  assumptions,  a  larger  set  of  such  correspondences  between  the  Scurfield 

482  observer  and  the  ideal  observer  exists. 

483  Chan  et  al.  are  working  on  the  application  of  three-class  classification  to  CAD, 

484  and  thus  explicitly  take  the  ideal  observer  as  the  starting  point  in  the  devel- 

485  opment  of  their  decision  rule  (Chan  et  ah,  2003).  Although  this  rendered  our 

486  analysis  of  that  decision  rule  in  terms  of  ideal  observer  decision  theory  largely 

487  trivial,  their  decision  rule  merits  attention  as  an  example  of  a  situation  in 

488  which  the  ideal  observer  is  indeed  making  use  of  information  from  the  three 

489  classes  of  observations  (he.,  its  behavior  is  demonstrably  different  from  that 

490  of  a  two-class  ideal  observer),  while  only  producing  two  different  responses  for 

491  those  observations.  In  two-class  classification,  the  only  corresponding  exam- 

492  pies  are  trivial:  either  the  observer  always  calls  observations  positive  (achieving 

493  an  operating  point  of  (FPF  =  1,TPF  =  1),  where  FPF  is  the  false-positive 

494  fraction  and  TPF  the  true-positive  fraction)  or  always  calls  them  negative 

495  (FPF  =  0,  TPF  =  0). 

496  Finally,  we  showed  that,  given  a  particular  and  obvious  choice  of  ideal-observer- 

497  related  decision  variables,  the  decision  rule  proposed  by  Mossman  (Mossman, 

498  1999)  does  not  correspond  to  ideal  observer  behavior  for  any  possible  values  of 

499  the  observer’s  utilities.  However,  we  note  that  the  structure  of  the  Mossman 

500  decision  rule  —  a  simple  sequence  of  thresholds  on  single  decision  variables  — 

501  may  indeed  serve  as  a  reasonable  model  for  human  observer  performance  in 

502  certain  situations,  e.  g.,  differential  diagnosis.  That  such  a  decision  rule  fails 

503  to  be  an  ideal  observer  decision  rule  may  be  considered  surprising,  given  the 

504  properties  the  Mossman  decision  rule  shares  with  that  of  Chan  et  al.  —  in 

505  particular,  the  identity  of  two  out  of  the  three  decision  boundary  lines.  The 

506  reasons  why  one  decision  rule  can  be  said  to  correspond  to  ideal  observer  be- 

507  havior,  while  a  rule  similar  in  structure  does  not  when  used  with  a  particular 

508  and  obvious  choice  of  decision  variables,  are  connected  to  fundamental  con- 

509  straints  on  the  ideal  observer’s  behavior;  given  the  inherent  complexities  of  the 

510  three-class  classification  task,  it  is  easy  for  such  subtleties  to  be  overwhelmed 

511  by  other  details.  A  close  comparison  of  two  possible  three-class  classification 

512  decision  rules  can  thus  provide  an  immediate  and  intuitive  understanding  of 

513  such  properties,  even  though  a  complete  and  fully  general  solution  to  the 
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514 


three-class  classification  problem  remains  elusive. 
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ABSTRACT 

We  have  shown  in  previous  work  that  an  ideal  observer  in  a  classification  task  with  TV  classes  achieves  the  optimal 
receiver  operating  characteristic  (ROC)  hypersurface  in  a  Neyman-Pearson  sense.  That  is,  the  hypersurface 
obtained  by  taking  one  of  the  ideal  observer’s  misclassification  probabilities  as  a  function  of  the  other  TV2  —  TV  —  1 
misclassification  probabilities  is  never  above  the  corresponding  hypersurface  obtained  by  any  other  observer. 
Due  to  the  inherent  complexity  of  evaluating  observer  performance  in  an  TV-class  classification  task  with  TV  >  2, 
some  researchers  have  suggested  a  generally  incomplete  but  more  tractable  evaluation  in  terms  of  a  hypersurface 
plotting  only  the  TV  “sensitivities”  (the  probabilities  of  correctly  classifying  observations  in  the  various  classes). 
An  TV-class  observer  generally  has  up  to  N2  —  TV  —  1  degrees  of  freedom,  so  a  given  sensitivity  will  still  vary  when 
the  other  TV  —  1  are  held  fixed;  a  well-defined  hypersurface  can  be  constructed  by  considering  only  the  maximum 
possible  value  of  one  sensitivity  for  each  achievable  value  of  the  other  TV  —  1.  We  show  that  optimal  performance 
in  terms  of  this  generally  incomplete  performance  descriptor,  in  a  Neyman-Pearson  sense,  is  still  achieved  by 
the  TV-class  ideal  observer.  That  is,  the  hypersurface  obtained  by  taking  the  maximal  value  of  one  of  the  ideal 
observer’s  correct  classification  probabilities  as  a  function  of  the  other  TV  —  1  is  never  below  the  corresponding 
hypersurface  obtained  by  any  other  observer. 

Keywords:  ROC  analysis,  three-class  classification,  ideal  observer  decision  rules 

1.  INTRODUCTION 

We  are  attempting  to  extend  the  well-known  observer  performance  evaluation  methodology  of  receiver  operating 
characteristic  (ROC)  analysis1-  2  to  classification  tasks  with  three  classes.  This  could  conceivably  be  of  benefit, 
for  example,  in  a  medical  decision-making  task  in  which  a  region  of  a  patient  image  must  be  characterized  as 
containing  a  malignant  lesion,  a  benign  lesion,  or  only  normal  tissue. 

Unfortunately,  a  fully  general  but  tractable  extension  of  ROC  analysis  has  yet  to  be  developed.  It  is  known 
that  the  performance  of  an  observer  in  a  classification  task  with  TV  classes  (TV  >  2)  can  be  completely  described 
by  a  set  of  N2  —  TV  conditional  error  probabilities,4, 5  and  that  the  performance  of  the  ideal  observer  (that 
which  minimizes  Bayes  risk4)  is  completely  characterized  by  an  ROC  hypersurface  in  which  these  conditional 
error  probabilities  depend  on  a  set  of  N2  —  TV  —  1  decision  criteria.5  Although  analytic  expressions  for  the  ideal 
observer’s  conditional  error  probabilities  given  reasonable  models  for  the  underlying  observational  date  have 
been  worked  out  in  the  two-class  case,6  this  has  not  yet  been  accomplished  in  a  fully  general  manner  for  tasks 
with  three  or  more  classes.  Furthermore,  we  have  shown  that  an  obvious  generalization  of  the  area  under  the 
ROC  curve  (AUC)  does  not  in  fact  yield  a  useful  performance  metric  in  tasks  with  three  or  more  classes.7  More 
recently,  we  showed  that  complicated  constraining  relationships  exist  among  the  decision  criteria  themselves  for 
the  ideal  observer.8  These  constraining  relationships  appear  to  imply  that  it  is  highly  unlikely  that  analytical 
expressions  for  the  conditional  error  probabilities  in  terms  of  the  decision  criteria  can  be  developed  which  are  as 
simple  to  interpret  as  those  for  the  two-class  task.6 

Despite  the  difficulties  just  described,  the  potential  benefits  to  be  gained  from  a  practical  performance  eval¬ 
uation  methodology  for  classification  tasks  with  three  classes  have  motivated  a  number  of  research  groups  to 
propose  such  methods.  These  practical  methods  reduce  the  number  of  degrees  of  freedom  required  to  describe 
the  observer’s  performance,  either  by  implicitly  leaving  the  remaining  degrees  of  freedom  out  of  the  analysis,  or 
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by  explicitly  imposing  restrictions  on  the  form  of  the  observer’s  decision  rule  or  on  the  set  of  decision  criteria 
used  by  the  observer. 

Scurfield  evaluated  an  observer  which  used  a  specified  decision  rule  with  only  two  degrees  of  freedom  (as 
opposed  to  the  five  decision  criteria  used  by  the  general  three-class  ideal  observer)  by  plotting  a  set  of  six 
(two-dimensional)  surfaces  in  three-dimensional  ROC  spaces.9  Mossman  proposed  plotting  the  surface  formed 
only  from  the  set  of  three  “sensitivities”  (conditional  probabilities  of  correctly  classifying  observations)  for  an 
observer  with  two  degrees  of  freedom,  and  applied  this  method  to  an  observer  with  a  specified  decision  rule.10 
Chan  et  al.  began  with  an  ideal  observer  model,  and  reduced  the  number  of  decision  criteria  from  five  to  two  by 
imposing  explicit  assumptions  on  the  observer’s  decision  utilities;  the  observer’s  performance  was  then  plotted 
as  a  surface  in  a  three-dimensional  ROC  space,  the  axes  of  which  are  the  probabilities  of  deciding  an  observation 
to  be  malignant  conditional  on  each  of  the  three  actual  class  memberships.11  He  etal.  investigated  an  ideal 
observer  model  in  which  the  decision  rule  is  restricted  to  a  form  similar  to  that  proposed  by  Scurfield;  the  nature 
of  the  restrictions  is  such  that  performance  evaluation  in  terms  of  only  the  three  sensitivities  provides  a  complete 
description  of  this  observer’s  performance.12 

A  common  theme  among  these  remarkably  diverse  methods  is  the  idea  of  an  “ROC  surface,”  i.e.,  a  surface 
with  two  degrees  of  freedom  in  a  three-dimensional  ROC  space.  An  appealing  feature  of  such  a  construct  is 
its  visualizability:  it  can  be  plotted  as  readily  as  any  elevation  map,  for  example,  in  stark  contrast  to  the  fully 
general  three-class  classification  task  involving  a  hypersurface  with  five  degrees  of  freedom  in  a  six-dimensional 
ROC  space  as  mentioned  above.  While  it  is  true  that  not  all  of  the  proposed  methods  described  in  the  preceding 
paragraph  involve  a  “sensitivity”  ROC  surface,  the  general  division  of  an  iV-class  observer’s  conditional  decision 
probabilities  into  a  set  of  N  sensitivities  and  a  set  of  N2  —  N  misclassification  rates5  makes  this  particular 
construct  a  natural  candidate  for  further  analysis. 

On  the  other  hand,  it  can  be  argued  that  measurement  of  performance  in  terms  of  only  N  conditional 
classification  rates  must  be  an  incomplete  description  of  observer  performance  in  a  classification  task  with 
more  than  two  classes,  which  requires  N 2  —  N  such  classification  rates  as  stated  above.  Acknowledging  this 
incompleteness,  we  would  like  to  ask  whether  there  is  any  sense  in  which  such  an  incomplete  performance  metric 
is  at  least  well-defined.  In  particular,  is  there  any  observer  decision  rule,  dependent  on  only  N  —  1  (rather 
than  N2  —  N  —  1)  decision  criteria,  for  which  the  observer’s  sensitivity  ROC  hypersurface  is  always  above  the 
corresponding  hypersurface  obtained  for  any  other  observer?  If  so,  what  form  does  this  decision  rule  take? 

In  the  next  section,  we  show  that  the  three-class  observer  which  optimizes  performance  only  in  terms  of  the 
sensitivity  surface  is  in  fact  the  three-class  ideal  observer,  with  its  decision  utilities  constrained  in  a  particular 
way  (reducing  its  degrees  of  freedom  from  five  to  two  as  necessary).  Additionally,  the  form  of  the  constraints 
on  the  ideal  observer’s  behavior  are  identical  to  those  considered  by  He  etal..12  In  Sec.  3,  we  extend  this  result 
to  the  general  case  of  an  N- class  observer,  showing  that  the  observer  which  attains  the  optimal  sensitivity 
hypersurface  is  a  restricted  form  of  the  A-class  ideal  observer,  and  in  particular  a  straightforward  generalization 
of  the  three-class  observer  considered  by  He  etal.12  to  N  classes.  Our  conclusions  are  stated  in  Sec.  4. 

2.  THREE-CLASS  OBSERVERS 

We  have  shown5  that  the  iV-class  ideal  observer  —  that  observer  which  minimizes  Bayes  risk  —  also  achieves 
optimal  performance  in  an  ROC  sense,  by  virtue  of  satisfying  the  Neyman-Pearson  criterion.  This  was  the  same 
argument  used  by  Van  Trees4  to  show  that  the  two-class  ideal  observer  achieves  the  optimal  ROC  curve  for 
a  given  two-class  classification  task.  This  technique  of  satisfying  the  Neyman-Pearson  criterion,  essentially  an 
application  of  an  integral  form  of  the  method  of  Lagrange  multipliers,13  is  straightforward  (conceptually,  if  not 
notationally)  and  flexible,  and  we  apply  it  in  this  section  to  answer  the  question  of  what  observer  optimizes 
performance  in  terms  of  only  the  three  observer  sensitivities. 

We  denote  by  Py  the  conditional  probability  of  a  given  observer  deciding  an  observation  is  drawn  from  the 
ith  class,  conditional  on  it  actually  being  drawn  from  the  jth  class.  Thus,  the  three  sensitivities  are  Pn,  P22, 
and  P33.  Decisions  are  assumed  to  be  made  based  on  statistically  variable  observational  data;  in  particular, 

Pij  =  [  p(x\iTj)  dmx, 

JZi 


(1) 


where  Z,,  is  the  region  for  which  observations  x  (of  dimension  m)  are  decided  to  belong  to  the  class  labeled  7q 
(1  <  i  <  3). 

Without  loss  of  generality,  we  seek  to  maximize  P33  subject  to  the  constraints  Pn  =  an  and  P2 2  =  022 
where  0  <  an  <  1  and  0  <  a22  <  1.  We  define  the  function 

F  =  P33  +  Aii(Pn  —  an),  +A22(P22  —  a22)  (2) 

where  An  and  A22  are  the  so-called  Lagrange  multipliers.  Note  that  if  we  can  find  a  decision  rule  (a  partitioning 
of  the  domain  of  x  into  Z\,  Z-2,  and  Z3)  that  maximizes  F  for  arbitrary  values  of  An  and  A22,  then  this  will 
be  equivalent  to  maximizing  P33  at  the  point  at  which  the  constrain  equations  are  satisfied  (i.  e.,  at  the  point 
P11  =  an,P22  =  a22). 

We  first  rewrite  F  by  applying  rules  for  conditional  probabilities: 

P  =  —Anan  —  A22a22  +  (1  —  P13  —  P23)  +  An(l  —  P2 1  —  P31)  +  A22(l  —  P12  —  P32) 

=  1  +  An(l  —  an)  +  A22(l  —  a22)  —  {A22Pl2  +  P13  +  AnP2i  +  P2  3  +  A11P31  +  A22P32} 

=  1  +  An (1  -  an)  +  A22(l  -  a22)  -  jy  A22p(x|7r2)  +  p(x |7t3)  dmx 

+  Xhp(x\tti)  +p(f|7T3)  dmx+  /  \np(x\Tn)  +  \22p(x\ir2)dmx\  .  (3) 

J  Z2  J  Z3  J 

For  a  given  set  of  values  of  the  parameters  An  and  A22,  F  is  maximized  when  the  quantity  in  braces  is  minimized. 
This  quantity,  in  turn,  can  be  minimized  by  assigning  a  given  x  to  the  region  Zi  such  that  the  ith  integrand 
(from  among  the  integrals  in  braces  in  Eq.  3)  is  minimized.  (Situations  in  which  two  or  more  of  the  integrands 
yield  the  same  minimal  value  for  a  given  x  can  be  decided  in  an  arbitrary  but  consistent  fashion.) 

That  is, 


decide  m  iff  A22p(:r|7r2)  <  Anp(x|7Ti)  and  p(x\n3)  <  Anp(T|7ri)  (4) 

decide  7r2  iff  Anp(x|7Ti)  <  A22p(£|7r2)  and  p(x|7r3)  <  A22p(T|7r2)  (5) 

decide  7r3  iff  Anp(x|7Ti)  <  p(x|7T3)  and  A22p(x|7r2)  <  p(x|7r3).  (6) 

We  can  divide  these  relations  by  p(x|7r3)  to  obtain 

decide  tt\  iff  AuLRi  —  A22LR2  >  0  and  AuLRi  >1  (7) 

decide  7r2  iff  AuLRi  —  A22LR2  <  0  and  A22LR2  >  1  (8) 

decide  713  iff  AuLRi  <  1  and  A22LR2  <  1,  (9) 

where  LRi  =  p(x|7Tj)/p(x|7r3)  are  the  likelihood  ratio  decision  variables  used  by  the  ideal  observer.4,5  The  decision 
boundary  lines  which  partition  the  (LRi,LR2)  decision  plane  into  the  regions  Z\,  Z2,  and  Z3  are  thus 

AuLRi  —  A22LR2  =  0  (10) 

AuLRi  =  1  (11) 

A22LR2  =  1.  (12) 


Note  that  Eq.  12  is  just  the  difference  between  Eqs.  10  and  11.  If  we  require  An  and  A22  to  be  positive,  the 
decision  rule  is  an  ideal  observer  decision  rule.5  Since  neither  the  decision  variables  nor  the  form  of  the  decision 
rule  depend  on  the  particular  choices  of  an  and  a22,  we  can  conclude  that  the  three-class  sensitivity  ROC 
surface,  obtained  by  allowing  An  and  A22  to  take  on  all  possible  positive  values,  is  optimal  for  the  observer 
defined  in  Eqs.  10-12,  in  the  sense  that  no  other  observer  can  achieve  a  higher  sensitivity  surface  (i.e.,  a  surface 
with  a  greater  value  of  P33  at  a  given  value  of  (Pn,P22)).  The  optimal  observer  for  this  performance  metric  is 
seen  to  be  the  three-class  ideal  observer,  with  its  decision  criteria  constrained  so  that  the  line  separating  classes 
7Ti  and  7T3  is  vertical,  the  line  separating  classes  7 r2  and  7r3  is  horizontal,  and  the  line  separating  classes  7Ti  and 


Figure  1.  The  decision  rule  which  is  found  to  be  optimal  in  the  sense  of  maximizing  the  ROC  surface  composed  of  only 
the  observer  sensitivities.  The  decision  variables  are  the  likelihood  ratios  used  by  the  general  three-class  ideal  observer, 
and  the  number  of  decision  criteria  is  reduced  from  five  (for  the  general  three-class  ideal  observer)  to  two. 


7 r2  passes  through  the  origin  with  slope  A11/A22  (and  thus  intersects  the  other  two  lines  as  required).  Note  that 
the  number  of  free  decision  criteria  has  been  reduced  from  five  (for  the  general  three-class  ideal  observer)  to  two 
(as  expected  for  a  surface  in  a  three-dimensional  ROC  space). 

This  decision  rule  is  shown  in  Fig.  1.  It  is  interesting  to  note  that  this  observer  is  identical  to  the  special  case 
of  the  ideal  observer  evaluated  by  He  etal .,12  which  we  have  shown14,15  to  be  a  special  case  of  the  decision  rule 
proposed  by  Scurfield.9 


3.  iV-CLASS  OBSERVERS 

The  results  of  the  preceding  section  can  be  generalized  to  tasks  with  N  classes  for  any  N  >  2.  We  now  have 
a  set  of  N 2  conditional  classification  probabilities  Pij,  with  N  sensitivities  Pa .  Equation  1  remains  unchanged, 
except  that  there  are  of  course  now  N  regions  Z,  into  which  the  domain  of  x  is  partitioned  (i.e.,  classes  into 
which  the  observations  are  classified),  and  the  observations  are  drawn  from  N  distributions  of  the  form  p(x\n:j). 

Without  loss  of  generality,  we  seek  to  maximize  Pnn  subject  to  the  constraints  Pa  =  an  for  1  <  i  <  N  —  1, 
where  0  <  an  <1.  We  define  the  function 

N—l 

F  =  Pnn  +  ^  A u(Pu  —  an),  (13) 

where  the  An  are  the  Lagrange  multipliers.  Note  that  if  we  can  find  a  decision  rule  (a  partitioning  of  the 
domain  of  x  into  Zi  {1  <  *  <  N})  that  maximizes  F  for  arbitrary  values  of  the  An,  then  this  will  be  equivalent 
to  maximizing  Pjvat  at  the  point  at  which  the  constrain  equations  are  satisfied  (i.e.,  at  the  point  Pa  =  an 
{1  <i<N  -  1}). 

As  in  the  preceding  section,  we  rewrite  F  by  applying  rules  for  conditional  probabilities  to  obtain: 
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(14) 


For  a  given  set  of  values  of  the  parameters  Xu  {1  <  i  <  N  —  1},  F  is  maximized  when  the  quantity  in  braces 
is  minimized.  This  quantity,  in  turn,  can  be  minimized  by  assigning  choosing  the  regions  Zj  such  that  a  given 
x  to  the  region  Zi  such  that  the  *th  integrand  (from  among  the  integrals  in  braces  in  Eq.  14)  is  minimized. 
(Situations  in  which  two  or  more  of  the  integrands  yield  the  same  minimal  value  for  a  given  x  can  be  decided  in 
an  arbitrary  but  consistent  fashion.) 

That  is, 


decide  7r,;{i  <  N} 


decide  ttn 


iff  Xjjp(x\nj)  <  X-up(x\TTi) 

and  p(x\ttn)  <  Xup{x\-Ki) 
and  Xjjp(x\iTj)  <  Aiip(f|7ri) 
iff  Xjjp(x\iTj)  <  p(x\ttn) 


{i  <j<N} 

{j  <i  <  N} 
{j  <  N}. 


We  can  divide  these  relations  by  p(x\ttn)  to  obtain 

decide  n.i{i  <  N}  iff  A„LR,  —  A^LRj  >0  {i  <  j  <  N} 

andAriLR..(  >  1 

andAjjLR.j  —  AjjLRj  <0  {j  <  i  <  N} 
decide  ttn  iff  AjyLRj  <1  {j  <  N}, 


(15) 

(16) 


(17) 

(18) 


where  LR(  =  p(x\n.i)/p(x\TTN)  are  the  likelihood  ratio  decision  variables  used  by  the  ideal  observer.4,5  The 
decision  boundary  hyperplanes  which  partition  the  LR  =  (LRi, . . .  ,LRat_i)  decision  space  into  the  regions  Zi 
are  thus 


A„;LR,t  —  AjjLR,  =  0  {i  <  j  <  TV}  (19) 

AjiLRj  =  1  {i<N}.  (20) 

Note  that  any  of  these  equations,  for  example  that  defining  part  of  the  boundary  between  classes  TTj  and  irk  ,  can 
be  expressed  as  the  difference  of  two  other  such  equations  (in  this  example,  those  defining  boundaries  between 
classes  7 q  and  irj,  and  between  classes  pii  and  7 r*,).  If  we  require  the  Xu  to  be  positive,  the  resulting  decision  rule 
is  an  ideal  observer  decision  rule.5  Since  neither  the  decision  variables  nor  the  form  of  the  decision  rule  depend 
on  the  particular  choices  of  an,  we  can  conclude  that  the  iV-class  sensitivity  ROC  hypersurface,  obtained  by 
allowing  the  A  a  to  take  on  all  possible  positive  values,  is  optimal  for  the  observer  defined  in  Eqs.  19  and  20,  in 
the  sense  that  no  other  observer  can  achieve  a  higher  sensitivity  hypersurface  (i.e.,  one  with  a  greater  value  of 
Pnn  at  a  given  value  of  (Pu, . . . ,  -P(jv-i)(jv-i)))-  The  optimal  observer  for  this  performance  metric  is  seen  to 
be  the  IV-class  ideal  observer,  with  its  decision  criteria  constrained  so  that  the  boundary  separating  classes  tt1 
and  7rjv  is  a  hyperplane  defined  by  LR;  =  1  /Xu,  while  the  boundary  separating  classes  7 r,  and  7 Tj  is  a  hyperplane 
defined  by  Ar,LR.(  =  Xj  j LR;/ . 

Although  an  intuitive  geometric  understanding  of  this  decision  rule  is  more  elusive  than  in  the  three-class 
case,  it  is  at  least  evident  that  the  boundaries  intersect  as  expected;  that  is,  the  boundary  separating  classes 
7Ti  and  7 tj  intersects  the  boundary  separating  classes  7T;  and  7Tfc,  and  also  intersects  the  boundary  separating 


classes  ttj  and  7Tfc.  Note  also  that  the  number  of  free  decision  criteria  has  been  reduced  from  N 2  —  N  —  1  (for 
the  general  iV-class  ideal  observer)  to  N  —  1  (as  expected  for  a  hypersurface  in  an  iV-dimensional  ROC  space). 
More  importantly,  comparison  of  Eqs.  19  and  20  with  Eqs.  10-12  reveals  this  TV-class  observer  to  be  an  obvious 
extension  from  three  to  TV  classes  of  the  observer  described  in  the  preceding  section. 

4.  CONCLUSIONS 

A  fully  general  performance  evaluation  methodology  for  the  three-class  classification  task  has  yet  to  be  developed, 
a  frustrating  state  of  affairs  given  the  great  success  and  wide  application  of  ROC  analysis  to  two-class  classification 
tasks.  A  primary  reason  for  the  difficulty  in  developing  a  fully  general  extension  of  ROC  analysis  to  the  three- 
class  classification  task  is  the  rapid  increase  in  the  number  of  performance  measurement  variables  and  decision 
criteria  necessary  to  characterize  observer  (in  particular,  ideal  observer)  performance.  Specifically,  the  number 
of  sensitivities  or  misclassification  rates  needed  increases  from  two  to  six  (and  to  TV 2  —  TV  in  the  general  case), 
while  the  number  of  decision  criteria  increases  from  a  single  decision  variable  threshold  to  a  set  of  five  mutually 
constrained8  criteria  (and  to  TV 2  —  TV  —  1  in  the  general  case).  In  short,  the  complexity  of  the  problem  increases 
not  linearly  with  the  number  of  classes,  but  quadratically. 

The  motivation  for  the  numerous  proposed  methods,  outlined  in  Sec.  1,  for  evaluating  the  performance  of 
a  three-class  classifier  in  terms  of  two-dimensional  surfaces  in  three-dimensional  ROC  spaces  (rather  than  the 
five-dimensional  hypersurfaces  in  six-dimensional  ROC  spaces  required  by  the  theory)  is  thus  quite  clear.  We 
currently  lack  a  theoretical  framework  with  which  to  judge  the  appropriateness  of  any  of  the  proposed  methods 
to  any  particular  classification  task.  However,  even  if  one  chooses  to  adopt  a  performance  evaluation  metric 
known  to  provide  an  incomplete  description  of  observer  performance,  it  is  still  reasonable  to  ask  what  observer, 
if  any,  will  achieve  optimal  performance  with  respect  to  that  metric. 

We  have  addressed  that  question  in  regard  to  measurement  of  an  observer’s  performance  in  terms  of  only 
its  sensitivities  (the  probabilities  of  correctly  classifying  the  three,  or  in  general  TV,  classes  of  observations). 
Theoretically,  this  is  clearly  an  incomplete  measure  of  performance  (another  set  of  three,  or  in  general  TV 2  —  2 TV, 
misclassification  rates  are  necessary).  Conceding  this  point,  we  consider  it  a  nontrivial  observation,  derived  in 
the  preceding  sections,  that  the  observer  which  optimizes  this  limited  performance  metric  is  not  one  unrelated 
to  the  general  ideal  observer,  nor  an  arcane  special  case  of  the  ideal  observer,  but  a  special  case  of  the  ideal 
observer  which  is  in  a  subjective  sense  quite  simple,  and  which  has  been  independently  evaluated  from  very 
different  perspectives  by  other  researchers.9- 12  We  find  these  results  at  once  reassuring  and  encouraging,  and 
hope  that  research  into  this  thorny  problem  will  continue  to  bear  unexpected  fruit. 
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Abstract 

We  have  shown  previously  that  an  iV-class  ideal  observer  achieves  the  optimal  receiver  operat¬ 
ing  characteristic  (ROC)  hypersurface  in  a  Neyman-Pearson  sense.  Due  to  the  inherent  complexity 
of  evaluating  observer  performance  even  in  a  three-class  classification  task,  some  researchers  have 
suggested  a  generally  incomplete  but  more  tractable  evaluation  in  terms  of  a  surface  plotting  only  the 
three  “sensitivities.”  More  generally,  one  can  evaluate  observer  performance  with  a  single  sensitivity  or 
misclassification  probability  as  a  function  of  two  linear  combinations  of  sensitivities  or  misclassification 
probabilities.  We  consider  four  such  formulations  including  the  “sensitivity”  surface.  In  each  case  we 
show  that  the  optimal  observer  with  respect  to  the  given  evaluation  method  is  a  special  case  of  the  ideal 
observer,  with  certain  constraints  placed  on  the  ideal  observer’s  decision  utilities.  Furthermore,  we  show 
that  if  these  utility  constraints  are  imposed  on  a  general  expression  for  expected  utility,  this  quantity  is 
found  to  depend  only  on  those  sensitivities  and  misclassification  probabilities  used  to  construct  the  ROC 
surface  in  question.  That  is,  for  the  observer  which  maximizes  performance  with  respect  to  the  given 
restricted  ROC  surface,  that  ROC  surface  provides  a  complete  description  of  the  observer’s  performance 
in  an  expected-utility  sense. 
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Optimization  of  restricted  ROC  surfaces  in 
three-class  classification  tasks 

I.  Introduction 

We  are  attempting  to  extend  the  well-known  observer  performance  evaluation  methodology  of 
receiver  operating  characteristic  (ROC)  analysis  [1],  [2]  to  classification  tasks  with  three  classes. 
This  could  conceivably  be  of  benefit,  for  example,  in  a  medical  decision-making  task  in  which 
a  region  of  a  patient  image  must  be  characterized  as  containing  a  malignant  lesion,  a  benign 
lesion,  or  only  normal  tissue  [3]. 

Unfortunately,  a  fully  general  extension  of  ROC  analysis  has  yet  to  be  developed.  It  is  known 
that  the  performance  of  an  observer  in  a  classification  task  with  N  classes  (N  >  2)  can  be 
completely  described  by  a  set  of  N2  —  N  conditional  error  probabilities  [4],  [5],  and  that 
the  performance  of  the  ideal  observer  (that  which  minimizes  Bayes  risk  [4])  is  completely 
characterized  by  an  ROC  hypersurface  in  which  these  conditional  error  probabilities  depend  on 
a  set  of  N2  —  N  —  1  decision  criteria  [5].  Although  analytic  expressions  for  the  ideal  observer’s 
conditional  error  probabilities  given  reasonable  models  for  the  underlying  observational  date 
have  been  worked  out  in  the  two-class  case  [6],  this  has  not  yet  been  accomplished  in  a 
fully  general  manner  for  tasks  with  three  or  more  classes.  Furthermore,  we  have  shown  that 
an  obvious  generalization  of  the  area  under  the  ROC  curve  (AUC)  does  not  in  fact  yield  a 
useful  performance  metric  in  tasks  with  three  or  more  classes  [7].  More  recently,  we  showed 
that  complicated  constraining  relationships  exist  among  the  decision  criteria  themselves  for  the 
ideal  observer  [8].  These  constraining  relationships  appear  to  imply  that  it  is  highly  unlikely  that 
analytical  expressions  for  the  conditional  error  probabilities  in  terms  of  the  decision  criteria  can 
be  developed  which  are  as  simple  to  interpret  as  those  for  the  two-class  task  [6]. 

Despite  the  difficulties  just  described,  the  potential  benefits  to  be  gained  from  a  practical 
performance  evaluation  methodology  for  classification  tasks  with  three  classes  have  motivated  a 
number  of  research  groups  to  propose  such  methods.  These  practical  methods  reduce  the  number 
of  degrees  of  freedom  required  to  describe  the  observer’s  performance,  either  by  implicitly  leav¬ 
ing  the  remaining  degrees  of  freedom  out  of  the  analysis,  or  by  explicitly  imposing  restrictions 
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on  the  form  of  the  observer’s  decision  rule  or  on  the  set  of  decision  criteria  used  by  the  observer. 

Scurfield  evaluated  an  observer  which  used  a  specified  decision  rule  with  only  two  degrees 
of  freedom  (in  general  a  three-class  observer  can  have  up  to  five  degrees  of  freedom)  by 
plotting  a  set  of  six  (two-dimensional)  surfaces  in  three-dimensional  ROC  spaces  [9].  Mossman 
proposed  plotting  the  surface  formed  only  from  the  set  of  three  “sensitivities”  (conditional 
probabilities  of  correctly  classifying  observations)  for  an  observer  with  two  degrees  of  freedom, 
and  applied  this  method  to  an  observer  with  a  specified  decision  rule  [10].  Chan  etcil.  began 
with  an  ideal  observer  model,  and  reduced  the  number  of  decision  criteria  from  five  to  two  by 
imposing  explicit  assumptions  on  the  observer’s  decision  utilities;  a  description  of  the  observer’s 
performance  (which  they  also  showed  to  be  complete)  was  then  plotted  as  a  surface  in  a  three- 
dimensional  ROC  space,  the  axes  of  which  are  the  probabilities  of  deciding  an  observation 
to  be  malignant  conditional  on  each  of  the  three  actual  class  memberships  [11].  He  etal. 
investigated  a  special  case  of  the  ideal  observer  model  which  is  also  a  special  case  of  the 
decision  rule  proposed  by  Scurfield;  they  showed  that  due  to  the  assumptions  of  their  model, 
performance  evaluation  in  terms  of  only  the  three  sensitivities  provides  a  complete  description 
of  this  observer’s  performance  [12]. 

A  common  theme  among  these  remarkably  diverse  methods  is  the  idea  of  an  “ROC  surface,” 
i.e.,  a  surface  with  two  degrees  of  freedom  in  a  three-dimensional  ROC  space.  An  appealing 
feature  of  such  a  construct  is  its  visualizability:  it  can  be  plotted  as  readily  as  any  elevation 
map,  for  example,  in  stark  contrast  to  the  fully  general  three-class  classification  task  involving  a 
hypersurface  with  five  degrees  of  freedom  in  a  six-dimensional  ROC  space  as  mentioned  above. 

On  the  other  hand,  it  can  be  argued  that  measurement  of  three- class  classification  performance 
in  terms  of  only  three  conditional  classification  rates  will  yield  an  incomplete  description  of 
observer  performance.  (A  complete  description  should  require  six  such  conditional  classification 
rates  as  stated  above.)  Acknowledging  this  possible  incompleteness,  we  would  like  to  ask  whether 
there  is  any  sense  in  which  such  a  restricted  performance  evaluation  method  is  at  least  well- 
defined.  In  particular,  suppose  we  elect  to  measure  performance  in  terms  of  an  ROC  surface  given 
by  a  single  sensitivity  or  conditional  error  rate  as  a  function  of  two  different  linear  combinations 
of  other  sensitivities  or  conditional  error  rates).  We  then  ask,  is  there  any  observer  decision  rule, 
dependent  on  only  two  (rather  than  five)  decision  criteria,  for  which  the  specified  ROC  surface 
is  never  below  (when  the  surface’s  dependent  variable  is  a  sensitivity)  or  never  above  (when  the 
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surface’s  dependent  variable  is  a  conditional  error  rate)  the  corresponding  surface  obtained  for 
any  other  observer?  If  so,  what  form  does  this  decision  rule  take? 

In  the  remainder  of  this  work,  four  different  observer  decision  strategies  proposed  recently  in 
the  literature  are  analyzed  with  regard  to  the  questions  just  posed.  Each  strategy  considered  is 
a  special  case  of  the  three-class  ideal  observer,  which  classifies  observations  by  maximizing  the 
expected  utility  of  its  decisions.  For  each  special  case  considered  here,  the  expected  utility  is 
constrained  to  depend  on  only  three  (rather  than  six)  conditional  classification  rates.  We  show, 
in  each  case,  that  the  observer  which  maximizes  performance,  in  a  Neyman-Pearson  sense  [4], 
[5],  is  in  fact  the  proposed  special  case  of  the  ideal  observer. 

In  Sec.  II,  we  consider  the  decision  rule  proposed  by  Chan  etal.  [11];  in  Sec.  Ill,  that  proposed 
by  He  etal.  [12],  which  is  itself  a  special  case  (in  which  the  decision  variables  used  are  the 
logarithms  of  the  likelihood  ratios  of  the  data  being  classified)  of  the  decision  rule  proposed 
by  Scurfield  [9];  and,  in  Secs.  IV  and  V,  two  other  special  cases  of  the  Scurfield  decision  rule, 
in  which  the  decision  variables  are,  respectively,  the  likelihood  ratios  and  the  a  posteriori  class 
membership  probabilities  of  the  data  being  classified.  Finally,  we  summarize  these  results  and 
present  some  brief  conclusions  in  Sec.  VI. 


II.  The  Chan  et  al.  Observer 


The  expected  utility  of  the  decisions  made  by  an  observer  in  an  iV-class  classification  task 
can  be  expressed  as  [5] 

N  N 

£{U}  = 

i= 1  3= 1 
N  N 

=  Z!  Ui\jp(d  =  *i|t  =  7Tj)-P(t  =  nj)’  (1) 

*= 1  3= 1 


where  the  labels  7Ti  through  nN  identify  the  classes  to  which  observations  belong;  the  number 
Ui\j  is  defined  as  the  utility  of  deciding  an  observation  belongs  to  class  i n  given  that  it  is 
actually  drawn  from  class  up  and  the  random  variables  t  and  d  indicate  the  true  class  to 
which  a  randomly  drawn  observation  belongs  and  the  observer’s  decision  for  classifying  that 
observation,  respectively.  For  notational  simplicity,  we  will  write  the  conditional  classification 
rate  P( d  =  zr* 1 1  =  nj)  as  P,v  and  the  a  priori  class  membership  probability  P(t  =  Hi)  as  Pini). 
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For  a  three-class  classification  task,  the  expected  utility  can  be  written  explicitly  as 

E{  U}  =  [U^Pn  +  U2]1P21  +  C3|iP3i]P(7n) 

+  [Ui\2Pi2  +  U2\2P22  +  C3|2-P32]P(7r2) 

+  [^113-^13  +  C2|3P23  +  C3|3P33]P(7T3).  (2) 

Note  that  the  nine  conditional  classification  rates  P.l3  appearing  in  this  expression  are  not 
independent;  for  example,  given  the  definition  of  conditional  probability,  it  must  be  the  case 
that  Pn  +  P2 1  +  P3 1  =  1.  Thus  within  any  pair  of  square  brackets,  one  of  the  three  conditional 
classification  rates  can  be  eliminated,  leaving  an  expression  which  depends  in  general  on  six 
conditional  classification  rates. 

Chan  etal.  consider  a  classification  task  in  which  class  Hi  represents  “benign,”  class  i r2 
“normal,”  and  class  7t3  “malignant”  observations  (e.g.,  for  structures  evident  in  a  medical 
image)  [11].  They  simplify  the  expression  in  (2)  by  restricting  all  values  of  utility  to  lie  between 
0  and  1;  by  setting  the  “correct  decision”  utilities  Ui\i,  U2\2,  and  U3 13  to  be  1;  the  “missed 
malignancy”  utilities  Cm  and  C2|3  to  be  0;  and  the  utilities  for  incorrect  decisions  not  involving 
malignancies  Ui\2  and  U2 q  to  be  1.  The  remaining  “false-positive”  utilities  C3|i  and  C3|2  are  free 
to  vary  in  the  range  [0, 1]. 

With  these  assumptions,  the  expression  for  expected  utility  is  reduced  to 

£{Ucha„}  =  [Al  +  Pi  1  +  £/3|lPsi]P(*l) 

+  [-P12  +  P22  +  C3|2P32]P(7t2) 

+  P33P(7T3).  (3) 

This  can  in  turn  be  simplified  further  using  the  definition  of  conditional  probability  to  yield 

£{UChan}  =  [1  -  P31  +  C3,1P31]P(vr1) 

+  [1  —  P32  +  C3|2P32]P(7r2) 

+  P33P(^3)'i  (4) 

as  Chan  etal.  point  out  [11],  this  expression  depends  on  three  rather  than  six  conditional 
classification  rates,  namely  P3|i,  P3|2,  and  P3|3.  These  three  rates  are  used  to  construct  the 
ROC  space  in  which  they  analyze  the  performance  of  their  observer.  That  observer  in  turn  is  the 
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special  case  of  the  ideal  observer  obtained  by  imposing  the  above  constraints  on  the  decision 
utilities  U-jir 

The  three-class  ideal  observer  makes  decisions  by  partitioning  a  likelihood  ratio  decision 
variable  plane  into  three  regions  with  three  intersecting  lines  [4],  [5].  The  likelihood  ratios  can 
be  taken  to  be  LRi  =  p(x|7ri)/p(x|7r3)  and  LR2  =  p(x|7r2)/p(x|vr3),  ratios  of  the  conditional 
probability  density  functions  of  the  observational  data  x  taken  as  functions  of  that  random 
observational  data.  (We  use  boldface  type  to  denote  statistically  variable  quantities.)  In  the 
notation  we  advocate  [8],  the  equations  for  the  three  decision  boundary  lines  are 


7121  LRi  —  72i2LR2 

=  7313  —  7323 

(5) 

7l3lLRl  +  (7232  —  7212)LR2 

=  7313 

(6) 

(7131  —  7121  )LRi  +  7232LR2 

=  7323, 

(7) 

which  we  call,  respectively,  the  “l-vs.-2”  line,  the  “1  -v.s.-3”  line,  and  the  “2- vs. -3”  line.  Here 
7 iji  =  (Ui\i  —  U j\i) P (it i) .  Although  we  have  found  it  useful  to  assume  these  quantities  to  be 
strictly  positive,  this  is  not  a  fundamental  requirement,  and  Chan  etal.  indeed  allow  some  of 


them  (e.g.,  7121)  to  be  zero  (consistent  with  the  constraints  they  place  on  the  U,\:)  as 

above).  They  obtain  the  resulting  ideal  observer  decision  lines 

described 

OLR1-OLR2  =  0 

(“1-VS.-2”} 

(8) 

(1  -  E73|i)P(tti)LRi  +  (1  -  U3l2)P(n2)LR2  =  P(tt3) 

{“1-VS.-3”} 

(9) 

(1  -  C/3|i)R(7ri)LRi  +  (1  -  P3|2)JP(vr2)LR2  =  P(tt3) 

n5 

1 

< 

1 

u> 

(10) 

which  actually  correspond  to  a  single  line  (as  the  first  is  undefined  and  the  remaining  two  are 
degenerate).  This  decision  strategy  is  illustrated  in  Fig.  1. 

In  summary,  Chan  etal.  begin  with  an  ideal  observer  model,  impose  particular  constraints 
on  the  decision  utilities  in  that  model,  and  then  determine,  based  on  those  constraints,  both  the 
resulting  form  of  the  special  case  of  the  ideal  observer  and  the  conditional  classification  rates 
appropriate  to  measuring  its  performance.  We  now  wish  to  pose  a  question  from  a  different 
point  of  view:  suppose  one  chooses  to  measure  arbitrary  (L  e.,  not  necessarily  ideal)  observer 
performance  only  in  terms  of  the  conditional  classification  rates  P33,  P31,  and  P32,  ignoring  the 
other  rates.  For  any  observer,  we  can  construct  an  ROC  surface  with  P33  as  a  function  of  P31 
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Fig.  1.  The  decision  strategy  investigated  by  Chan  etal. ,  which  is  a  special  case  of  the  ideal  observer  decision  strategy. 
Observations  in  the  unlabeled  region  are  decided  “not  7T3,”  i.  e. ,  either  “m”  or  “n2”. 


and  P32.  (For  an  observer  with  more  than  two  degrees  of  freedom  in  its  decision  strategy,  one 
can  simply  define  the  surface  to  be  the  maximum  value  of  P33  achievable  at  any  given  (P3i,  P32) 
pair.)  What  observer,  if  any,  will  achieve  optimal  performance  with  respect  to  this  surface? 

A  convenient  method  for  defining  “optimal  performance”  here  is  in  terms  of  the  Neyman- 
Pearson  criterion  [4],  [5];  the  technique  of  satisfying  the  Neyman-Pearson  criterion  is  essentially 
an  application  of  an  integral  form  of  the  method  of  Lagrange  multipliers  [13].  We  seek  to 
maximize  P33  at  a  particular  point  (P31  =  a31,P32  =  a32)  in  the  domain  of  the  given  ROC 
space.  Another  way  of  stating  this  is  to  consider  P33,  P3I,  and  P32  as  functionals  of  the  observer’s 
decision  rule;  we  seek  to  maximize  P33  subject  to  the  constraints  P3i  =  q3i  and  P32  =  a32 .  To 
find  this  maximum,  we  define  a  function 

Fchan  =  -P33  +  A3i(P3i  —  a3i)  +  A32(P32  —  a32),  (11) 

where  A33  and  A32  are  free  parameters  (the  so-called  Lagrange  multipliers).  Note  that  maximizing 
Fq han  at  the  particular  point  (P3i  =  a3i,  P32  =  a32)  is  equivalent  to  maximizing  P33  at  that  point; 
if  the  maxima  for  arbitrary  points  (P3i,  P32)  are  achieved  by  a  single  decision  rule  independent 
of  a3i  and  a32,  the  resulting  surface  will  be  the  desired  optimal  surface. 

As  stated  in  the  material  leading  up  to  (5)-(7),  the  decisions  here  are  assumed  to  be  made 
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based  on  statistically  variable  observational  data.  Explicitly, 


where  Z%  is  the  region  for  which  observations  x  (of  dimension  m)  are  decided  to  belong  to  the 
class  labeled  7 r.j  (1  <  i  <  3).  The  expression  for  Tc^an  can  then  be  simplified  as  follows: 

-Pchan  =  1  —  -Pl3  —  P23  +  A31P31  —  ^31^31  +  A32P32  —  A32CC32 

=  1  —  A3ia3i  —  A32CC32  —  {Pl3  +  -P23  —  A31P31  —  A32-P32} 

=  1  -  A3iO:3i  -  A32CC32  -  i  /  p(x\n3)  dmx  +  /  p(x\n3)  dmx 

UZi  Jz2 

+  jz  -A31p(f |7Ti)  -  A32p(T|7r2)  dmx | .  (13) 

^Chan  is  maximized  when  the  quantity  in  braces  is  minimized.  This  quantity,  in  turn,  can  be 
minimized  by  assigning  a  given  x  to  the  region  Zt  such  that  the  ?'th  integrand  (from  among  the 
integrals  in  braces  in  (13))  is  minimized.  (Situations  in  which  two  or  more  of  the  integrands 
yield  the  same  minimal  value  for  a  given  x  can  be  decided  in  an  arbitrary  but  consistent  fashion.) 
That  is, 

decide  7Ti  iff  p(x\n3)  <  p(x\n3)  and  p(x\n3)  <  —  A3ip(x|7r1)  —  A32  p(x\n2)  (14) 

decide  n2  iff  p(x\n:i)  <  p(x\tt3)  and  p(x\ir3)  <  -A3ip(f|7Ti)  -  A32p(f|7r2)  (15) 


decide  vr3  iff  -A3ip(f|7Ti)  -  A32p(T|7r2)  <  p(x |7t3) 

and  -  A3ip(f|7r1)  -  A32p(f|7r2)  <  p(f|7r3).  (16) 

We  can  divide  these  relations  by  p(x\tt3)  to  obtain 

decide  7Ti  iff  OLRi  —  0LR2  >  0  and  —  A31LR1  —  A32LR2  >  1  (17) 

decide  7t2  iff  OLRi  —  0LR2  <  0  and  —  AsiLRi  —  A32LR2  >  1  (18) 

decide  7 r3  iff  — A31LR1  —  A32LR2  <  1  and  —  A31LR1  —  A32LR2  <  1.  (19) 


(We  assume  without  loss  of  generality  that  p(x\7t3)  >  0,  because  the  task  reduces  to  a  two- 
class  problem  for  values  of  x  such  that  p(x\tt3)  =  0.)  The  boundary  lines  which  partition  the 
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(LRi,LR2)  decision  variable  plane  into  the  regions  Zj ,  Z2,  and  Z3  are  thus 


OLRi  —  0LR2 

=  0 

{“1-VS.-2”} 

(20) 

— A3iLRi  —  A32LR2 

=  1 

{“1-VS.-3”} 

(21) 

— A3iLRi  —  A32LR2 

=  1 

l 

< 

l 

(22) 

If  we  require  A3i  and  A32  to  be  nonpositive,  and  then  define  the  quantities  P3|i  and  U3\2  such 
that  -A3i  =  (1  -  C/3|i)P(7ri)/P(7r3)  and  -A32  =  (1  -  U3\2)P(Tr2)/P(7r3),  the  resulting  decision 
strategy  is  found  to  be  identical  to  that  stated  in  (8)— (10).  The  special  case  of  the  ideal  observer 
proposed  by  Chan  etal.,  whose  performance  depends  only  on  the  conditional  classification  rates 
P33,  P31,  and  P32  by  (4),  is  indeed  the  observer  which  obtains  optimal  performance  with  respect 
to  this  set  of  conditional  classification  rates. 

III.  The  He  et  al.  Observer 

He  etal.  also  begin  with  an  ideal  observer  model  and  thus  with  the  expression  for  expected 
utility  given  in  (2);  the  classification  task  of  interest  to  them  is  to  distinguish  two  types  of 
abnormal  cardiac  ejection  from  normal  cardiac  behavior  in  nuclear  medicine  studies  [12].  They 
simplify  this  expression  by  requiring  that  the  two  possible  incorrect  classifications  of  observations 
actually  from  a  given  class  be  equal.  That  is,  U2\i  =  C/3|i,  Ui |2  =  U3 \2,  and  Ui\3  =  U2 13.  The 
expression  for  expected  utility  is  thereby  reduced  to 

E{UHe}  =  [Ui\iPu  +  U2ll(P21  +  P3i)]P(tt1) 

+  [U2\2P22  +  Ui\2(Pl2  +  P32)}P{^2) 

+  [U3\3P33  +  Ui\3(Pis  +  P23)}P(tt3).  (23) 

This  can  in  turn  be  simplified  further  using  the  definition  of  conditional  probability  to  yield 

£{UHe}  =  \U2\1  +  (UM1  -  P2|i)Ai]P(7r1) 

+  [Pl|2  +  (U2 12  —  Pl|2)P22]P(7T2) 

+  [^i|3  +  (P3|3  —  Pi|3)P33]P(tt3);  (24) 

as  He  etal.  point  out  [12],  this  expression  depends  on  only  the  three  “sensitivities”  Pn,  P22,  and 
P33,  rather  than  six  conditional  classification  rates.  The  three  sensitivities  are  used  to  construct  the 
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Fig.  2.  The  decision  strategy  investigated  by  He  etal.,  which  is  a  special  case  of  the  ideal  observer  decision  strategy,  and 
which  can  also  be  shown  to  be  a  special  case  of  the  Scurfield  observer  with  decision  variables  equal  to  the  logarithms  of  the 
likelihood  ratios  of  the  observational  data. 


ROC  space  (equivalent  to  that  proposed  by  Mossman  [10])  in  which  they  analyze  the  performance 
of  their  observer.  That  observer  in  turn  is  the  special  case  of  the  ideal  observer  obtained  by 
imposing  the  above  constraints  on  the  decision  utilities  Ut\r 

Applying  the  stated  constrains  on  the  utilities  to  the  ideal  observer  decision  boundary  lines 
given  in  (5)-(7)  yields 


7121LR1  —  72i2LR2 

=  0 

(25) 

7121LR1 

=  7313 

(26) 

7212LR2 

=  7313- 

(27) 

This  decision  strategy  is  illustrated  in  Fig.  2.  We  have  recently  shown  [14]  that  this  decision 
strategy  is  a  special  case  of  that  proposed  by  Scurfield  [9]  when  the  decision  variables  used  by 
the  Scurfield  observer  are  the  logarithms  of  the  likelihood  ratios  of  the  observational  data. 

We  now  consider  evaluating  the  performance  of  an  arbitrary  observer  in  the  ROC  space 
constructed  only  from  the  observer’s  sensitivities  (i.  e.,  Pn,  P22,  and  P33).  Without  loss  of 
generality,  we  can  define  such  an  observer’s  ROC  surface  as  P33  considered  as  a  function  of 
Pn  and  P22;  to  find  the  optimal  observer  with  respect  to  this  restricted  performance  evaluation 
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method,  we  apply  the  Neyman-Pearson  criterion  to  maximize  P33  subject  to  the  constraints 
(Pn  =  an,  P22  =  a22).  We  define  the  function 

pHe  =  P33  +  Aii(Pn  —  an)  +  A22(P22  —  a22),  (28) 

where  An  and  A22  are  again  the  Lagrange  multipliers. 

Using  (12),  this  can  be  simplified  to  yield 

pHe  =  1  —  Pi  3  —  P23  +  An(l  —  P2i  —  P3i)  —  Aiittn  +  A22(l  —  Pi2  —  P32)  —  A22a22 
=  1  —  Anttii  —  A22a22  —  {Pi3  +  P23  +  Ah(P2i  +  P3i)  +  A22(P32  +  P32)} 

=  1  -  Anttii  -  A22a22  -  [  jz  X22p(x \n2)  +  p(x\n3)  dmx 

+  f  Anp(f  |7Ti)  +  p(x |7t3)  dmx  +  f  Anp(f  |7Ti)  +  \22p(x\ir2)  dmx\  .  (29) 

Jz2  Jz3  J 

Fj-jc  is  maximized  when  the  quantity  in  braces  is  minimized.  This  quantity,  in  turn,  can  be 
minimized  by  assigning  a  given  x  to  the  region  Z,  such  that  the  ?'th  integrand  (from  among  the 
integrals  in  braces  in  (29))  is  minimized.  (Situations  in  which  two  or  more  of  the  integrands 
yield  the  same  minimal  value  for  a  given  x  can  be  decided  in  an  arbitrary  but  consistent  fashion.) 
That  is, 


decide  7Ti  iff  A22p(T|7r2)  <  Anp(T|7ri)  and  p(x |7t3)  <  Anp(a:|7ri)  (30) 

decide  n2  iff  Xnp(x\ni)  <  X22p(x\n2)  and  p(x\n3)  <  X22p(x\tt2)  (31) 

decide  7t2  iff  Xup(x\ni)  <  p(x\n3)  and  A22p(f|7r2)  <  p(f|7r3).  (32) 

We  can  divide  these  relations  by  p(a?|  zr3)  to  obtain 

decide  7Ti  iff  AiiLRi  —  A22LR2  >  0  and  AnLRi  >  1  (33) 

decide  n2  iff  AnLRi  —  A22LR2  <  0  and  A22LR2  >  1  (34) 

decide  n3  iff  AnLRx  <  1  and  A22LR2  <  1.  (35) 

The  boundary  lines  which  partition  the  (LRi,  LR2)  decision  variable  plane  into  the  regions  Z\, 
Z2,  and  Z3  are  thus 

AuLRa  -  A22LR2  =  0  {“l-vs.-2”}  (36) 

AnLRi  =  1  {“1-VS.-3”}  (37) 

A22LR2  =  1  (“2-VS.-3”}.  (38) 
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If  we  require  An  and  A22  to  be  positive,  and  define  the  quantities  7121  =  An73i3  and  7212  = 
A227313  for  some  arbitrary  positive  7313,  then  the  resulting  decision  strategy  is  found  to  be 
identical  to  that  stated  in  (25)-(27).  The  special  case  of  the  ideal  observer  proposed  by  He 
etal.,  whose  performance  depends  only  on  the  conditional  classification  rates  Pu,  P22,  and  P33 
by  (24),  is  indeed  the  observer  which  obtains  optimal  performance  with  respect  to  this  set  of 
conditional  classification  rates. 

IV.  The  Scurfield  Observer  (Likelihood  Ratio) 

In  the  preceding  two  sections,  we  considered  decision  strategies  that  have  been  proposed  by 
other  researchers  as  special  cases  of  the  three-class  ideal  observer  decision  strategy.  That  is, 
particular  constraints  were  explicitly  imposed  in  the  work  cited  on  the  decision  utilities  used 
by  the  ideal  observer.  The  remaining  two  decision  strategies  we  consider  in  the  present  work 
are  special  cases  of  a  decision  strategy  proposed  by  Scurfield  [9]  which  was  not  claimed  to  be 
generally  related  to  the  ideal  observer;  specifically,  Scurfield  specified  the  decision  boundary 
lines  used  by  the  observer,  but  made  no  assumptions  concerning  the  observer’s  two  decision 
variables. 

We  showed  recently  [14]  that  if  particular  forms  of  the  observer’s  decision  variables  related  to 
the  likelihood  ratios  of  the  observational  data  are  chosen,  then  the  resulting  decision  strategies 
can  be  shown  to  be  special  cases  of  the  ideal  observer  decision  strategy.  One  such  special  case 
is  the  observer  analyzed  by  He  etal.  [12],  discussed  in  Sec.  Ill,  in  which  the  decision  variables 
used  by  the  Scurfield  observer  are  the  logarithms  of  the  likelihood  ratios.  Two  other  such  special 
cases  are  the  Scurfield  observer  with  the  likelihood  ratios  themselves  as  decision  variables,  which 
we  consider  in  this  section;  and  that  with  the  a  posteriori  class  membership  probabilities  used 
as  decision  variables,  considered  in  Sec.  V.  A  minor  difference  from  the  preceding  two  sections 
is  that  we  must  determine  the  the  implicit  constraints  on  the  ideal  observer’s  utilities  from  the 
known  form  of  the  decision  rule,  rather  than  the  other  way  around. 

The  general  Scurfield  observer  makes  decisions  by  partitioning  a  decision  variable  plane 
(yi,y2)  into  three  regions  via  the  decision  boundary  lines 


1 

to 

=  71-72 

(39) 

yi 

=  7i 

(40) 

May  30,  2006 


DRAFT 


12 


Fig.  3.  A  special  case  of  the  decision  strategy  investigated  by  Scurfield,  in  which  the  decision  variables  used  are  the  likelihood 
ratios  (LRi,LRo)  of  the  observational  data. 


V2  =  72,  (41) 

where  71  and  72  are  parameters  upon  which  the  observer’s  performance  depends  (roughly 
equivalent  to  the  decision  criterion  of  a  two-class  classifier).  When  the  decision  variables  are 
themselves  the  likelihood  ratios  (LRi,LR2),  this  becomes  in  our  notation 


t"1 

1 

t-1 

to 

7313  _  7323 

(42) 

7121 

LRi  = 

7313 

(43) 

7121 

lr2  = 

7323 

(44) 

7121 


(Compare  (39)-(41)  with  (5)-(7),  and  note  that  in  order  for  the  “l-vs.-2”  line  to  have  unit  slope, 
it  must  be  the  case  that  7121  =  7212-)  This  decision  strategy  is  illustrated  in  Fig.  3. 

The  relations  7121  =  7131  and  7212  =  7232  evident  from  the  above  equations  immediately 
give  the  constraints  on  the  decision  utilities  C/2|i  —  C/311  and  C/m  =  C/312-  Furthermore,  the 
relation  7121  =  7212  gives  (U^i  -  U2\i)P(tti)  =  (U2\2.  ~  Ui\2)P(ir2).  (Recall  from  Sec.  II  that 
7 iji  =  ( Um  —  Uj\i)P{jii).)  This  allows  us  to  simplify  the  expression  for  expected  utility  in  (2) 
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E  { Uscurfield:LR }  =  [UipPn  +  U2\i(P2i  +  P3i)]P(^i) 

+  \U2\2P22  +  U\\2(Pl2  +  P32)\P(^2) 

+  [U\\3Pl3  +  ^213-^23  +  ^3|3-P33]-P(tT3)-  (45) 

This  can  in  turn  be  simplified  further  using  the  definition  of  conditional  probability  to  yield 

-^{UscurfieldiLR}  =  [UipPu  +  U2 |i(l  -  -Pn)]-P(7Ti) 

+  \U2\2P22  +  C^l|2  (1  —  P22)\P{^2) 

+  [C4|3-fl3  +  (^213-^23  +  U3 13(1  —  Pl3  ~  P23)]P(7r3) 

\U2\1  +  -  U2\i)Pn]PM 

+  [U\  |2  +  (U'2\2  —  Ui\2)P22}P{ll2) 

+  [U3\3  +  (Ul\3  ~  U3\3)Pi3  +  (U2\3  —  U3\3)P23] 

=  U2\iP(n1)  +  Ui\2P(n2)  +  U3\3P(tt3) 

+  (Pn  +  P22)(Ulll-U2ll)P(n1) 

+  [Pi3(Ui\3  —  U3\3)  +  P23(U2\3  —  U3\3)}P(it3).  (46) 

This  expression  for  the  observer’s  expected  utility  depends  on  only  three  terms  related  to 
conditional  classification  rates:  P\3  and  P23,  which  may  be  regarded  as  the  misclassification 
rates  for  observations  actually  drawn  from  class  7 r3;  and  Pu  +  P22,  which  may  be  regarded  as 
the  “total  sensitivity”  for  observations  actually  drawn  from  classes  7Ti  and  7 r2  (ignoring  the  a 
priori  rates  for  such  observations). 

We  now  consider  evaluating  the  performance  of  an  arbitrary  observer  in  an  ROC-like  space 
constructed  from  the  quantities  Pn  +  P22,  P\3,  and  P23.  We  wifi  define  the  ROC-like  surface 
used  to  evaluate  observer  performance  as  the  first  quantity  considered  as  a  function  of  the  two 
misclassification  rates.  To  find  the  optimal  observer  with  respect  to  this  restricted  performance 
evaluation  method,  we  apply  the  Neyman-Pearson  criterion  to  maximize  Pn  +  P22  subject  to 
the  constraints  (P13  =  ai3,P23  =  a23).  We  define  the  function 

-^Scurfield:LR  =  Pll  +  P22  +  Ai3(Pi3  —  CI13)  +  \23(P23  —  a23),  (47) 
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where  Ai3  and  A23  are  the  Lagrange  multipliers. 

Using  (12),  this  can  be  simplified  to  yield 

^Scurfield:LR  =  1  —  P21  ~  P31  +  1  —  P 12  —  Pi2  +  A13P13  —  Ai3ai3  +  X23P23  —  A23«23 
=  2  —  Ai3ai3  —  A23«23  —  {P21  +  P31  +  P12  +  P32  ~  A13-P13  —  A23P23} 

=  2  -  Ai3ai3  -  A23«23  ~  [Jz  2)  -  Al3p(^|7T3)  gT£ 

+  [  p(x\ni)  -  x23p{x\n3)  dmx  +  f  p(x |7Ti)  +  p(x \n2)  dmx 1 .  (48) 

Jz2  Jz3  J 

T'ScurfieldLR 's  maximized  when  the  quantity  in  braces  is  minimized.  This  quantity,  in  turn,  can 

be  minimized  by  assigning  a  given  x  to  the  region  Z,  such  that  the  /th  integrand  (from  among 

the  integrals  in  braces  in  (48))  is  minimized.  (Situations  in  which  two  or  more  of  the  integrands 

yield  the  same  minimal  value  for  a  given  x  can  be  decided  in  an  arbitrary  but  consistent  fashion.) 

That  is, 


decide  m  iff  p(x\tt2)  -  Ai3p(f|7r3)  <  p(x |7Ti)  -  X23p(x\tt3) 

and  -  Ai3p(T|7T3)  <  p(x\ni)  (49) 

decide  n2  iff  p(x  |7Ti)  -  X23p(x\tt3)  <  p(x \n2)  -  Ai3p(f|7r3) 

and  -  A23p(^|7r3)  <  p(x \n2)  (50) 

decide  n3  iff  p(x  |7Ti)  <  -Ai3p(f|7r3)  and  p(x \n2)  <  -X23p(x\n3).  (51) 

We  can  divide  these  relations  by  p(x\n3)  to  obtain 

decide  7ti  iff  LRi  —  LR2  >  —  A13  +  A23  and  LRi  >  — Ai3  (52) 

decide  7t2  iff  LRi  —  LR2  <  — Ai3  +  A23  and  LR2  >  —  A23  (53) 

decide  n3  iff  LRx  <  —  A13  and  LR2  <  A23.  (54) 


The  boundary  lines  which  partition  the  (LRi,  LR2)  decision  variable  plane  into  the  regions  Z\, 
Z2,  and  Z3  are  thus 


lr2  = 

~a13 

+  A23 

{“1-VS.-2”} 

(55) 

LR,  = 

— a13 

{“1- 

-vs. -3”} 

(56) 

lr2  = 

—  A23 

{“2- 

-vs. -3”}. 

(57) 
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If  we  require  Ai3  and  A 23  to  be  negative,  and  define  the  quantities  7313  =  —  A137121  and  7323  = 
— A237121  for  some  arbitrary  positive  7121,  then  the  resulting  decision  strategy  is  found  to  be 
identical  to  that  stated  in  (42)-(44).  This  special  case  of  the  observer  proposed  by  Scurfield, 
which  we  have  shown  to  be  a  special  case  of  the  ideal  observer  [14],  has  a  performance  that 
depends  only  on  the  quantities  Pn  +  P22,  P13,  and  P2 3  by  (46).  This  is  indeed  the  observer 
which  obtains  optimal  performance  with  respect  to  this  set  of  quantities  related  to  the  conditional 
classification  rates. 


V.  The  Scurfield  Observer  (a  posteriori  Class  Probability) 

Equations  (39)-(41)  in  Sec.  IV  give  the  equations  for  the  decision  boundary  lines  of  the  general 
Scurfield  observer.  If  we  now  use  two  of  the  a  posteriori  class  membership  probabilities,  such 
as  P(jT\  |x)  and  P(7t2|x),  as  the  decision  variables,  the  equations  become 


P{ 7Tl|f)  -  P(7T2|f) 

=  71-72 

(58) 

P(vri|f) 

=  7i 

(59) 

P(tt  2\x) 

=  72, 

(60) 

with  0  <  71  <  1  and  0  <  72  <  1.  (Note  that  P(ir3\x)  =  1  —  P(tti\x)  —  P( n2\x),  meaning 
this  third  probability  is  not  needed  as  an  independent  decision  variable;  the  particular  choice 
of  which  two  probabilities  to  use  is  of  course  arbitrary.)  This  decision  strategy,  which  we  have 
shown  recently  to  be  a  special  case  of  the  ideal  observer  decision  strategy  [14],  is  illustrated  in 
Fig.  4. 

We  can  reexpress  the  above  equations  in  terms  of  likelihood  ratios  by  exploiting  the  relation 

p(x\'Ki)P(y'Ki) 


P(ni\x)  = 


p(x) 


LR,[P(7ri)/P(7r3)] 


(61) 


1  +  LR1[P(7r1)/P(7r3)]  +  LR2[P(n2)/P(n3)\  ’ 

where  the  second  equation  is  obtained  by  dividing  the  numerator  and  denominator  by  p(x\n3)P(n3). 
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Fig.  4.  A  special  case  of  the  decision  strategy  investigated  by  Scurfield,  in  which  the  decision  variables  used  are  the  a  posteriori 
class  membership  probabilities  P(-7rijx)  and  P(-7T2|x)  of  the  observational  data. 


The  equations  for  the  decision  boundary  lines  become 


-  LR-27^4  =  (71-72)  fl  +  LRi^4  +  LR2^il 


'PM 


LRi 

LR2 


PM 

PM 

PM 

PM 


PM 


‘ PM 


'PM 

which  can  in  turn  be  simplified  to  yield 


l,  .  T„  PM  .  PM\ 
-  ^{1  +  LRipM+  2  PM) 
PM  .  PM\ 

~  ^{1  +  LRipM+LR2pm)' 


[1  -  (71  -  72)]P(vti)LRi  -  [1  +  (71  -  72)]P(7r2)LR2 
(1  -  7i)P(7Ti)LRi  -  7iP(7t2)LR2 
-72P(7Ti)LRi  +  (1  -  72)P(7t2)LR2 


(71  -  l2)P(nz) 
7i 


72 P (tT3)- 


(62) 

(63) 

(64) 

(65) 

(66) 

(67) 


Although  the  above  equations  for  the  decision  boundary  lines  are  notably  more  complicated 
than  those  of  the  previous  three  sections,  we  can  still  relate  the  parameters  71  and  72  to  the 
decision  rule  parameters  of  (5)-(7)  to  obtain  constraints  on  the  utilities  Ut\r  For  example, 
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comparison  of  (66)  with  (6)  gives 

7232  —  7212  =  — 

U\  |2  —  f/312  =  — 7i)  (68) 

7313  =  7l-P(^3) 

i/313  —  t^i|3  =  7i-  (69) 

This  immediately  gives  the  constraint 

—  (i/l|2  —  i/312)  =  6/3 1 3  —  t/i|3-  (70) 

Similarly,  comparison  of  (67)  and  (7)  gives 

7131  -  7121  =  -72-P(fli) 

i/2|  1  —  i/311  =  — 72;  (71) 

7323  =  72-P(tT3) 

i/3|3  —  i/2|  3  =  72;  (72) 

yielding  the  constraint 

—  (i/2|l  —  i/311)  =  i/3|3  —  i/2|3-  (73) 

Finally,  we  add  the  first  two  coefficient  of  (65)  and  then  compare  with  (5)  to  obtain 

[1  _  (7l  -  7a)]  -  [1  +  (71  -  72)]  =  -2(71  -  72) 

(i/i,i  -  t/2,i)  -  (i/2|2  -  i/112)  =  -2(£72|3-i7i|3).  (74) 

(On  the  right  hand  side  of  the  above  equation,  we  have  made  use  of  (69)  and  (72).)  Note  that 
the  remaining  terms  in  (65)-(67)  involving  71  or  72  are  simply  differences  of  terms  already 
considered,  and  would  thus  yield  no  further  constraints  on  the  utilities. 

We  can  now  impose  constraints  (70),  (73),  and  (74)  on  the  general  expression  (2)  for  expected 
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utility  to  obtain  the  expected  utility  for  this  observer: 


P{Uscurfield:Ap}  — 


[U\\lP\l  +  P2|l(l  —  Pll  —  -P31)  +  Us\iP3i]P(ni) 

+  [Pl|2(l  —  P22  —  P32 )  +  U2\2P22  +  Us\2P32]P(^2) 

+  [^I|3-Pl3  +  ^213-^23  +  U3 13(1  —  P 13  —  P23)]P(7T3) 

[(U! |1  -  f/2|l)Pll  -  (P2|l  -  U3]1)P31  +  C/2|l]P(7T1) 

+  [(P2|2  —  Pl|2)P22  —  (Pl|2  —  t^3|2)-P32  +  Ui\2\P{n2) 

+  [  —  (^3|3  —  ^I|3)-Pl3  _  (^3|3  —  P2|3)P23  +  U^P^s) 

-  U2ll)Pu  +  (u3 ,3  -  P2|3)P31  +  U2\l}P{^l) 

+  {[(Pl|l  —  C^2|l)  +  2(C/2|3  —  t^l|3)]-P22(t4|3  —  ^1|3)-P32  +  Pl|2}P(7T2) 

+  { —  (^3|3  —  Pl|3)Pl3  —  (U3 13  —  P2|3)P23  +  Pa^P^) 

P2|iP(pi)  +  Pi|2P(tt2)  +  U3\3P(ir3) 

+  (C/1|1-C/2|1)[JP(7ri)Pii  +  P(7r2)P22] 

+  (P313  -  Pi|3)[P(vr2)P32  +  2P(7t2)P22  -  P(vr3)Pi3] 

+  (P3I3  —  P2|3)[P(7rl)P31  —  2P(7T2)P22  —  P(7T3)P23].  (75) 


As  was  the  case  for  the  decision  strategies  of  the  preceding  three  sections,  the  expected  utility  of 
this  observer  (and  thus  its  performance,  as  it  too  is  a  special  case  of  the  ideal  observer)  depends 
on  only  three  quantities  related  to  conditional  classification  rates  (but  not  the  observer’s  decision 
utilities),  namely  the  quantities  in  square  brackets  in  (75). 

The  first  quantity,  being  a  weighted  sum  of  “sensitivities”  with  positive  weights,  is  immediately 
seen  to  be  quite  suitable  for  the  dependent  variable  of  an  ROC  surface  —  a  higher  value  of 
this  quantity  is  clearly  preferable  to  a  lower  one.  (Indeed,  P(7Ti)Pn  +  P(vt2)P22  has  an  intuitive 
interpretation  as  the  probability  of  a  randomly  drawn  observation  being  both  (i)  from  either 
class  7Ti  or  7 r2  and  also  (ii)  correctly  classified  as  such.  Compare  the  corresponding  quantity 
P11  +  P22  from  Sec.  IV,  which  is  technically  not  even  a  probability.)  The  second  two  quantities 
in  square  brackets  in  (75)  discourage  any  such  straightforward  interpretation,  but  this  is  perhaps 
to  be  expected:  the  pleasantly  symmetric  form  of  the  Scurfield  decision  rule  of  (39)-(41)  in 
this  case  holds  in  the  (P(7Ti|x),  P(7t2|x))  decision  variable  plane;  due  to  the  complexity  of  the 
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transformation  in  (61),  this  symmetry  will  be  lost  in  the  likelihood  ratio  decision  variable  plane, 
and  the  expression  for  expected  utility  will  be  correspondingly  opaque. 

In  any  case,  we  now  consider  evaluating  the  performance  of  an  arbitrary  observer  in  an 
ROC-like  space  constructed  from  the  quantities  P(7Ti)Pn  +  P(7t2)P22,  P(?T2)P32  +  2P(7t2)P22  — 
P(7T3)Pi3,  and  P(7Ti)P3i  —  2P(7t2)P22  —  P(7t3)P23.  We  will  define  the  ROC-like  surface  used  to 
evaluate  observer  performance  as  the  first  quantity  considered  as  a  function  of  the  second  two. 
To  find  the  optimal  observer  with  respect  to  this  restricted  performance  evaluation  method,  we 
apply  the  Neyman-Pearson  criterion  to  maximize  P(vri)Pn  +  P(7t2)P22  subject  to  the  constraints 
P(7r2)P32  +  2P(vr2)P22  -  P(vr3)Pi3  =  «i,  P(7r1)P31  -  2P(7t2)P22  -  P(vr3)P23  =  a2).  We  define 
the  function 


^Scurfield:AP  =  P(Fi)Pli  +  P(7T2)P22 

+Ai[P(7T2)P32  +  2P(7T2)P22  -  P(tt3)Pi3  -  ai\ 

+A2[P(7Ti)P3i  -  2P(7t2)P22  -  P(7t3)P23  -  a2],  (76) 

where  A3  and  A2  are  the  Lagrange  multipliers. 

Using  (12),  this  can  be  simplified  to  yield 


F< 


Scurfield:AP 


=  -Ai«i  -  A2a2  +  P(7Ti)  [  p(x\7Ti)  dmx  +  p(vr2)  [  p(x\n2)dmx 

J  Z\  J  Z2 


+Ai 


P( vr2)  f  p{x\tt2)  dmx  +  2P(7t2)  f  p(x \n2)  dmx 

J  Z3  J  Z2 


-P(t 3)  /  p(x\ir3)dmx 
Jz1 


+A2 


P(vri)  ^  p(x |7Ti)  dmx  -  2P(7t2)  p(x |7t2)  dmx 


-P(tt3)  [  p(x\ti3)  dmx 
Jz2 


Collecting  terms  with  given  domains  of  integration  yields 


(77) 


^Scurfield:AP  =  -Ai«i  -  A2a2 

+  [  P{tii)p{x\tii)  -  \1P{Ti3)p{x\'K3)dmx 

J  Z\ 

+  /  P(vr2)p(T|7T2)  +  2(Ai  -  A2)P(7r2)p(£|7r2)  -  \2P(n3)p(x |7t3)  dmx 
Jz2 

+  [  AiP(7r2)p(f|7r2)  +  X2P(ni)p(x\ni)  dmx.  (78) 
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^Scurfield:AP  can  be  minimized  by  assigning  a  given  x  to  the  region  Z,  such  that  the  integrand 
over  Zt  in  (78)  is  minimized.  (Situations  in  which  two  or  more  of  the  integrands  yield  the  same 
minimal  value  for  a  given  x  can  be  decided  in  an  arbitrary  but  consistent  fashion.) 

That  is, 

decide  tti  iff  P(ni)p(x\ni)  —  AiP(7r3)p(o;|7r3) 

>  P(vr2)p(f|vr2)  +  2(Ai  -  A2)P(7r2)p(f|7r2)  -  A2P(7r3)p(f |7t3) 
and  P(7Ti)p(f|7Ti)  -  AiP(7r3)p(£|7r3) 

>  A1P(vr2)p(T|7r2)  +  A2P(vri)p(T|vr1)  (79) 

decide  vr2  iff  P(n2)p(x\n2)  +  2(Ai  -  A2)P(7r2)p(T|vr2)  -  X2P(n3)p(x\7r3) 

>  P(ni)p(x\ni)  -  XiP(tt3)p(x\7t3) 

and  P(n2)p(x\n2)  +  2(AX  -  X2)P(n2)p(x\n2)  -  X2P(n3)p(x\7r3) 

>  XiP(tt2)p(x\ti2)  +  X2P('Ki)p(x\'Ki)  (80) 

decide  7t3  iff  AiP(7r2)p(f|7r2)  +  A2P(7Ti)p(f|7Ti)  >  P(7Ti)p(f|7Ti)  -  AiP(7r3)p(T|7r3) 

and  AiP(7r2)p(f|7r2)  +  X2P{'Ki)p{x\ki) 

>  P(7T2)p(f|7T2)  +  2(Ai  -  A2)P(7T2)p(f|7r2)  -  X2P(lT3)p(x\lT3) .  (81) 

At  this  point,  we  could  divide  the  above  equations  by  p(x\tt3  to  obtain  decision  rules  in  terms  of 
the  likelihood  ratios,  as  in  the  preceding  sections.  However,  it  is  in  this  case  more  convenient  to 
work  with  the  a  posteriori  class  membership  probabilities  directly;  moreover,  because  we  have 
established  that  (58)-(60)  represent  the  boundary  lines  of  an  ideal  observer  decision  rule,  we 
are  justified  in  doing  so.  Thus,  given  that  P{jii)p(yx\n.i)  =  P(iTi\x)p(x),  we  divide  (79)— (81)  by 
p(x)  to  obtain 

decide  7 r3  iff  P(ni\x)  —  XiP(n3\x) 

>  P( n2\x)  +  2(Ai  -  X2)P(n2\x)  -  X2P(ti3\x ) 
and  P(ni\x)  —  AiP(7t3|T) 

>  AiP(7t2|x)  +  A2P(7Ti|x)  (82) 

decide  7t2  iff  P(n2\x)  +  2(X3  -  X2)P(n2\x)  -  X2P(n3\x) 
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>  P(iTi\x)  -  AiP(vr3|f) 

and  P( 7r2|f)  +  2(Ai  -  A2)P(7t2|£)  -  A2P(7r3|f) 

>  AiP(7T2|f)  +  A2P(7Ti|f)  (83) 

decide  7r3  iff  AiP(7r2|a;)  +  A2P(7Ti|a;)  >  P(7Ti|a:)  —  AiP(7r3|a:) 

and  AiP(7r2|f)  +  X2P(n1\x) 

>  P(tt2\x )  +  2(Ai  -  A2)P(7r2|f)  -  A2P(7r3|£).  (84) 

As  noted  at  the  beginning  of  this  section,  P(ir3\x)  =  1  —  P(7Ti|x)  —  P(7t2|x).  After  rearranging 
terms,  the  boundary  lines  which  partition  the  (P(7Ti|x),  P(7Ti|x))  decision  variable  plane  into 
the  regions  Z u  Z2,  and  Z3  are  found  to  be 

(l  +  Ai-A2)P(7n|f)-(l  +  Ai-A2)P(7r2|f)  =  Ai-A2  {“1-vs.-2”}  (85) 

(l  +  Ai-A2)P(vn|f)  =  Ai  {“1-VS.-3”}  (86) 

(l  +  A1-A2)P(vr2|f)  =  A2  (“2-V5.-3”}.  (87) 

If  we  define  the  quantities  71  =  Ai/(1  +  A3  —  A2)  and  72  =  A2/(l  +  A3  —  A2),  and  further 
require  0  <  Ai  and  0  <  A2  <  min{l,  (Ai  +  l)/2}  (so  that  0  <  71  <  1  and  0  <  72  <  1),  then 
the  resulting  decision  strategy  is  found  to  be  identical  to  that  stated  in  (58)-(60).  This  special 
case  of  the  observer  proposed  by  Scurfield,  which  we  have  shown  to  be  a  special  case  of  the 
ideal  observer  [14],  has  a  performance  that  depends  only  on  the  quantities  P(7Ti)Pn +  P(7t2)P22, 
JP(vr2)P32  +  2P(7r2)P22-P(vr3)Pi3,  and  P(vr1)P3i -2P(7t2)P22 -P(7t3)P23  by  (75).  The  observer 
described  above  is  indeed  that  which  obtains  optimal  performance  with  respect  to  this  set  of 
quantities  related  to  the  conditional  classification  rates. 


VI.  Conclusions 

Given  the  rapidly  increase  in  complexity  of  the  utility  constraints  and  performance  evaluation 
criteria  as  one  proceeds  from  Secs.  II  to  V,  it  is  quite  possible  for  the  main  point  of  the  above 
analyses  to  become  obscured.  That  main  point  is  that,  for  each  of  a  variety  of  constrained 
special  cases  of  the  three-class  ideal  observer,  the  performance  of  that  observer  is  completely 
describable,  in  an  expected- utility  sense,  by  only  two  decision  criteria  and  three  quantities  related 
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to  conditional  classification  rates.  This  represents  a  considerable  simplification  from  the  general 
model,  which  is  known  to  involve  five  decision  criteria  and  six  conditional  classification  rates. 

It  should  be  immediately  acknowledged  that  such  simplified  models  may  ultimately  prove  to 
be  of  limited  practical  importance.  Given  an  observer  known  to  closely  approximate  the  behavior 
of  the  ideal  observer,  or  indeed  given  a  human  observer,  it  is  difficult  to  conceive  of  a  pragmatic 
way  to  externally  constrain  the  observer’s  decision  utilities  to  match  a  particular  model  such  as 
the  ones  described  above.  On  the  other  hand,  an  algorithmic  observer  (such  as  an  implementation 
of  a  computerized  scheme  for  computer-aided  diagnosis)  might  readily  allow  such  constraints 
on  its  decision  rules  to  be  implemented;  however,  the  assumption  that  the  probability  density 
functions  of  the  decision  variables  generated  by  the  scheme  do  indeed  follow  those  required 
by  the  ideal  observer  model  would  generally  be  unverifiable,  given  the  limited  amount  of  data 
typically  available  for  training  and  testing  such  a  scheme. 

Despite  these  limitations,  it  remains  an  acknowledged  fact  that  a  fully  general  extension  of 
ROC  analysis  to  classification  tasks  with  three  or  more  classes  has  yet  to  be  developed.  Although 
the  investigation  of  constrained  and  therefore  tractable  observer  models  should  not  be  considered 
an  end  unto  itself,  a  thorough  understanding  of  such  models  is  almost  certain  to  prove  necessary 
for  the  development  of  more  general  observer  models.  We  believe  that  demonstrating  particular 
constrained  ideal  observer  models  to  be  complete  as  well  as  tractable  will  be  a  crucial  step 
toward  this  understanding. 
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